In a world of machine learning, is champion / challenger decision testing dead?
Much of the focus of machine learning is on making more powerful models and consequently more powerful predictions. This is especially useful when inputs and outcomes are less stable and machine learning models often outperform more traditional methods in their predictive power. What’s more, they learn fast. But the challenges of explainability in credit risk models developed using machine learning are well-understood and well-documented.
By extension, these predictions manifest into decisions. And this is ultimately the most critical aspect. Improving your GINI by 10 or 20 points is all well and good, but if you can’t implement that improved predictive power into decisions, it’s of no value at all. In essence, your model is only as good as what you do with it.
The human response to machine learning
And there is currently little conversation about using machine learning and AI in decisions. Asking machine learning to operate within parameters – constrained AI – is not only mathematically challenging, it by definition limits the outcomes to be sub-optimal. But in many circumstances, constraining or monitoring AI is essential to ensuring it produces outcomes that ‘make sense’ to us as humans.
Imagine allowing untamed AI to make what are logically the best decisions based on the information it has at the time – the inputs may vary from day to day or week to week, and so the outputs may deliver an emotionally inconsistent treatment. We aren’t an especially logical species and accounting for human behaviours and expectations is tricky.
An unchecked application of ML to decisioning could mean that today we drop you a friendly ‘by the way’ text about having exceeded your limit but tomorrow you receive a sternly worded letter that feels a world away from the little reminder just the day before. Of course, that also requires real-time inputs and real-time changes to models, which comes with its own technological challenge and accompanying price tag. We are of course used to using near real-time data and should continue to do so, but with consideration for the application and resultant outcome.
Predictions are not decisions
Indeed, predictions are not the same as decisions. That’s because decisions, or actions, change the outcomes you are trying to predict. For example, presenting different marketing offers can impact the likelihood of customer attrition, just as different collections treatments are likely to impact the amount collected. Therefore, HOW you use your ML models and predictions becomes crucially important – with the understanding that by taking different decisions or actions on your customers, you can influence and change those very outcomes your models are predicting.
It’s just as important to use experimental methodologies in the right place, such as A/B or champion/challenger testing, as it was in a world before AI and ML became a reality. That’s because strategy testing allows you to identify the BETTER or BEST decisions for populations, groups and individuals.
This mix of methodologies, applying the right tool to the job, is an important consideration in the overall data science strategy for credit risk.
In other words, it’s essential that we ask ourselves how better scores become better decisions.