Machine Learning: Five guiding principles in credit risk modelling

March 14, 2022 Mark Thompson

The interest in using Machine Learning (ML) for credit risk modelling continues apace, reflected in the number of papers on ML at last year’s Edinburgh credit scoring conference.

Much of the increased interest, across industries, is driven by the prospect of more powerful predictive models and faster development times compared to more traditional statistical regression methods.

Over the last few years, increases in computer processing power and data has driven an explosion in the development and use of Machine Learning models. For example, ML models are behind the success of many large businesses in various industries - Netflix personalises recommendations, Uber better predicts travelling habits, and in healthcare, disease identification and diagnosis is being improved.

Cloud computing is also enabling easy and cost-effective adoption of Machine Learning and advanced analytics, as public and private clouds not only provide space to easily store and analyse large quantities of data, but also enable use in a cost efficient and scalable way.

The pros and cons of machine learning in credit risk

But it is not just the predictive power driving interest in ML. For credit risk, it is the advances in Explainability and interaction detection – a method that automatically looks for non-linear combinations - which is so important and potentially valuable.

Essentially, we are seeing four primary reasons why ML approaches are being explored:

Improved predictive power and the business (and consumer) benefits that this brings. Their performance and power is often superior to more traditional statistical models due to their ability to identify and capture complex interactions and different predictive patterns in different sub-populations and segments of the overall population of interest.
Ease and efficiency as ML models potentially remove the need for time consuming segmentation analysis that more traditional methods often require
Benchmarking and validation, to determine what a ML model can achieve in comparison to existing approaches
New data and domains, for exploratory purposes. It’s often possible to get to a good model more quickly when working in a new domain, or with new data sources. Which can also inform characteristic generation through interaction detection.

Machine learning models are not new to lenders. Many have been using such models for fraud detection for many years. Though that’s not to say that they are straightforward to develop.

For those new to ML, there are three types of ML algorithm that can all be used to develop highly predictive models - Random Forests, Gradient Boosted Trees and Neural Networks.

They can each be used to develop ‘supervised classification’ predictive models – a technique to predict a given outcome – such as those used to predict re-payment/default for credit risk management. However, as opposed to statistical methods such as logistic regression, these ML models tend to be far more complex in terms of the amount and structure of the model code to deploy, and therefore the human interpretability and understanding of the model as well.

Random Forest models are developed whereby many, many (often well over 100) decision tree models, typically using the Classification and Regression Tree (CART) algorithm, are developed on bootstrap samples of the development data. The resultant final model takes the ‘average’ of all the trees; this could be the mean or the mode, depending on whether a regression model for continuous outcomes or classification model for discrete targets.

Gradient Boosted trees are developed differently, in that each individual tree is developed sequentially (using all development training data), with each subsequent tree aiming to reduce the misclassification errors of the prior trees when all used in combination. This is done by ‘boosting’ the importance of the misclassified data points in each sequential tree build.

Finally, Neural Networks are not tree-based and first appeared in the 1940s, fifty years before Tree Ensembles. They are created from very simple processing nodes formed into a network, in a similar way that biological neurons work together in a human brain.

In simple terms, when comparing ML models with more traditional logistic regression additive models, the trade-off tends to be less explainability for more model accuracy, at least in the case of well-developed models.

Making machine learning in credit risk a reality

There are five key factors that need to be considered when looking to adopt ML models. Some are more critical in credit risk due to regulatory requirements.

1. Operationalisation

Starting with the end in mind, it’s essential to consider where, how and when the model be deployed. Consider the constraints and requirements for the model in operation. For example, is the model used in live or off-line in batch? How and where will the score be calculated? Bear in mind the use of complex, large code (in comparison to traditional scorecards). Will you need to calculate reason codes and how quickly do these need to be computed?

2. Performance

Ask yourself how important robustness is. How long do you expect the model to continue to perform? Will updates be required, and if so, how? How will tracking and ongoing management be performed?

3. Engineering

Do you need to prove ‘fairness’ in the outcomes of the model? Can you control for overfitting? Can you cater for bias/errors in your data? How will you ensure palatability and robustness?

4. Ease of Development

Do you have or can you get access to the skills and tools you need? Do you know what they are and can you assess whether you are recruiting or acquiring well? What do you need to ensure in terms of productivity? Can you achieve everything you need within model governance standards?

5. Explainability

Explainability is often the biggest issue when it comes to satisfying certain regulations, such as GDPR in the EU. In other words, can you explain why an individual got a certain score and therefore a certain decision or treatment?

Consequently, explainability of ML models is a top priority for banks and regulators, with a great deal of research and advances made within the last few years. There are numerous model agnostic reports and measures that help expose, understand and explain complex models.

For example, ‘feature importance’ describes the effect of each on the model. A partial dependency plot can show whether the relationship between the target outcome and a feature is linear, monotonic or more complex. Others include, H-statistics, which can be used to understand the interaction between two features or a particular feature and all other features in the model, Local Interpretable Model-agnostic Explanations (LIME) that use local surrogate models to explain predictions, and Shapley Additive Explanations (SHAP), which calculate the contribution of each feature to the prediction.

The primary issue is that they’re all imperfect when it comes to explaining complex models, though at Paragon, we prefer to use SHAP as it allows us to learn more about individual features by studying them in isolation.

However, there is more to consider than just explainability and predictive power when considering the development and use of ML models.

For example, modelling is always a balancing act between over-fitting versus over-simplifying. When developing ML models, selecting and setting the right hyper-parameters can control for over-fitting. This is often an iterative process, using hold-out data samples that have not been used to train the ML models to test for over-fitting.

And whether ML or logistic regression, the credit risk modelling requirements remain the same:

Process – to drive efficiency, repeatability and control risks
People and Tools – to go hand in hand with the defined processes
Governance and Regulation – to drive best practice, help inform processes and demonstrate adherence

It is only by assessing all five areas in terms of the given situation and what is important and required in each area, that you can determine if a particular modelling approach is right and appropriate for that situation.

The answers to these questions may be different depending upon what is being predicted, when and where will the model be used. For example, marketing models vs. P.D. models vs. fraud models – they will all have different requirements and considerations.

Which is why we have recently introduced machine learning Random Forest and XGBoost approaches into our Modeller software. Even though many, many trees have to be developed for each tree ensemble Random Forest model, the time taken to develop these models this can be accelerated by building several trees at the same time in parallel. And the use of feature importance and SHAP values provides explainability. But you can still use traditional methods where it suits the application and in turn benchmark one approach against another.

We believe that providing choice and flexibility in modelling techniques is essential to allowing the modeller to use their skill and these five guiding principles in developing the best approach to any given requirement.