How Complex Models Can Be Made Explainable for Insurance

Written by Mind Foundry | May 17, 2024 10:29:46 AM

In the insurance industry, understanding and trusting the decision-making process of predictive models is essential. Historically, insurers used simple models that were easily explainable to regulators, but in recent years, as many models have evolved to become more complex and performant, they’ve lost explainability in the process. In a world before AI regulations, this tradeoff may have made business sense and provided a competitive advantage for those who were willing to take the risk. Today, however, as AI regulations take shape, explainability has come back into focus as being critical for balancing regulatory compliance and fairness with model performance.

Broadly speaking, models can be divided into two categories of explainability:

Transparent models, like linear regression and decision trees, are models whose inner workings can be revealed, interrogated, and understood in a way that makes sense to humans. They are relatively straightforward to explain but may lack the necessary complexity to make the most accurate predictions in highly competitive environments or challenging use cases. GLMs (Generalised Linear Models) are an extension of linear models and fit into this category.
Black-box models. Data scientists typically apply this term to models whose reasoning behind their outputs cannot be examined sufficiently by examining their inner workings. Neural networks and LLMs are classic examples, but the term can extend to include other models such as GBMs (Gradient Boosting Machines).

Explaining the Types of Models Used in Insurance

Hundreds of different machine learning models and techniques are used in the insurance industry to address different business needs. Linear regression, decision trees, Generalised Linear Models (GLMs), and Gradient Boosting Machines (GBMs) are the most commonly used today.

What is Linear Regression?

Linear regression is a statistical method used to predict a specific outcome based on the values of other variables where the relationship between the variables and the outcome is already well-defined. It works by finding a linear relationship, or a straight-line trend, between the input factors and the predicted outcome. For example, it might be used to predict the cost of a policy based on factors like age and car model. Linear regression models are typically transparent and explainable.

What are Decision Trees?

A decision tree model uses a tree-like graph of decisions and their possible consequences. It resembles a flowchart, where each node represents a decision based on a certain attribute, and the branches represent the outcomes of that decision. In insurance, decision trees can be used to assess the risk of insuring a person or property by mapping out different decision paths based on their historical data, such as claims history or driving behaviour. This model is particularly useful for visualising decision-making processes and handling categorical data. Decision tree models are typically transparent and explainable.

What are Generalised Linear Models (GLMs)?

GLMs are a statistical method widely used in insurance to analyse risk and determine pricing. They work by modelling the relationship between various risk factors (like age or home location) and outcomes (such as claim frequency), even when those relationships are not directly proportional. GLMs are flexible and allow different data types and distributions, making them ideal for various insurance applications. They are also typically transparent and explainable.

What are Gradient Boosting Machines (GBMs)?

GBMs are an advanced machine learning technique used widely in the insurance industry. They work by sequentially building decision trees, where each tree corrects errors of the previous ones, effectively learning from past mistakes. This allows GBMs to handle complex relationships between variables, making them ideal for nuanced tasks like insurance pricing. Their capacity to improve with each step and to manage diverse data types make them a powerful tool in the insurance industry's toolkit. Though data scientists often consider GBMs to technically be black box models due to their lack of explainability, in recent years, innovative techniques for unlocking sufficient levels of explainability within them have made them valuable assets in insurance

What Are the Concerns Around Black Box Models in Insurance?

The term 'black box' refers to models that cannot be understood simply by looking at the model’s learnt parameters that transform the inputs into outputs.

Insurers avoid using truly opaque black box models as regulations require the models they use to have a certain level of explainability. This is primarily due to their complex nature, which can lead to underwriting the wrong policies and resulting in unanticipated claims. The damage that undetected bias and discrimination can cause is also a major concern for any insurer. However, AI techniques for unlocking black box explainability can mitigate some of these concerns and thus facilitate their broader application in insurance.

To avoid unnecessary complexity, best practices emphasise simplicity and encourage natively explainable models wherever possible. Nevertheless, there are instances where even the simplest models may still operate as 'black boxes'. This often occurs when the predictive task is too complex for intrinsically explainable models, like Generalised Linear Models (GLMs). In such cases, budgetary limitations and resource constraints often necessitate building post hoc explanations for these “transparent” models.

Why are Black Box Models Not Explainable?

Black box models are characterised by their power, flexibility, and complexity. Consequently, they often sacrifice explainability due to the intricate architectures that capture nuanced patterns in data, thereby obscuring the rationale behind their predictions. This opacity raises concerns about their trustworthiness and suitability, particularly in regulated industries like insurance.

Proprietary software solutions, including black box models, emerged to fill a void when skilled insurance data scientists were scarce and automation of the model-building phase was necessary. These models are often overly complex because they are designed to be 'off-the-shelf' solutions, adaptable to various use cases and problem types.

This contrasts with models specifically built by knowledgeable data scientists for particular problems, which tend to be significantly simpler. The complexity of black box models often arises from vast numbers of parameters and non-linear interactions that are hard to trace and interpret.

Moreover, the proprietary nature of some algorithms limits transparency, as the inner workings are not always open for independent review. It's crucial to distinguish that 'out-of-the-box' and 'proprietary software' models are not necessarily black boxes; a true black box model is one where the internal mechanics—especially how model parameters translate into output predictions—are not easily interpretable, such as in the case of neural networks with millions of parameters.

Lastly, the stochastic nature of many machine learning methods can lead to varying solutions for similar problems, making it difficult to establish a clear and consistent explanation for their behaviour.

The Importance of Model Transparency

Transparency in machine learning models is more than a technical requirement for insurers; it has profound implications for business costs, customer experience, regulatory compliance, and internal team dynamics. More transparent models are often more composable and modular and, therefore, easier to update, enabling ongoing iterative model optimisation to counteract factors like data drift that negatively impact model performance.

Equally, insurers utilising or considering black-box models must recognise the necessity of explainability to prepare for regulatory scrutiny and mitigate associated risks. Enhanced transparency aids in the detection and correction of model biases, which is crucial for maintaining fairness in premium determination and claim settlements. It also enables a clearer understanding of model limitations and areas for improvement. This means that models can be further improved in targeted ways, which is more efficient than fine-tuning or extending complex black box systems.

Furthermore, transparent models can improve stakeholder engagement by demystifying the decision-making process and encouraging a culture of trust and accountability within the organisation and wider industry.

How Do Insurers Improve Black Box Explainability?

It’s important to understand that several techniques have been developed to enhance the interpretability of black-box models within the insurance industry. These methods can be broadly categorised into global and local model-agnostic approaches, each serving to illuminate different aspects of model behaviour:

Global Model-Agnostic Methods
These techniques provide a high-level overview of the model's operation across all instances. They include:

Surrogate Models - These are simpler models that approximate the predictions of the black-box model, providing an interpretable framework to understand complex model outputs. Some insurers use GLMs to approximate GBMs.
Feature Permutation - This method involves altering the values of input features to assess their impact on model predictions, helping to identify the most influential factors.
Explainable Boosting Machines (EBMs) - EBMs combine the interpretability of traditional models, like GLMs, with the predictive power of more complex models, like GBMs, offering insights into the contributions of individual features.
Partial Dependence Plots (PDPs) - PDPs illustrate the relationship between a feature and the predicted outcome, averaged over the distribution of other features, offering a visual representation of the feature's effect.
Accumulated Local Effects (ALEs) - ALEs quantify the impact of features on predictions, considering the local changes and accumulating these effects over the dataset. Thus, they provide a more accurate depiction than PDPs in the presence of correlated features.

Local Model-Agnostic Methods
These techniques clarify the reasoning behind specific outcomes for individual predictions:

Individual Conditional Expectation (ICE) - ICE plots show how the prediction changes when a feature varies, keeping all other features constant.
Local Interpretable Model-agnostic Explanations (LIME) - LIME provides explanations for individual predictions by approximating the local behaviour of the black-box model with an interpretable model.
Shapley Additive exPlanations (SHAP) - SHAP values quantify the contribution of each feature to individual predictions based on cooperative game theory.

For model predictions in pricing premium calculations, these techniques and transformer models can be used to output natural language explanations of why a customer’s premium has increased.

What Benefits Does Explainability for Black Box Models Bring to the Insurance Industry?

Explainability in black-box models offers distinct advantages to various stakeholders within the insurance industry - with pricing being a relevant example. By delineating the benefits for each persona, we can appreciate the comprehensive value of transparent models:

C-suite Executives
For the leadership team, explainability is a strategic asset. It ensures that the models align with business objectives and regulatory requirements, safeguarding the company's reputation and reducing the risk of non-compliance penalties. Transparent and explainable black box models also give executives the confidence to make data-driven decisions and communicate the rationale behind these decisions to company stakeholders and the public.

Pricing Managers
Those responsible for setting insurance premiums benefit from explainability through a deeper understanding of how models influence pricing strategies. This understanding allows for more accurate and fair pricing, which can be clearly justified to regulators and customers. It also enables pricing managers to refine and adjust models in response to market changes or new data, maintaining a competitive edge.

Data Scientists
For the architects of machine learning models, explainability facilitates model validation and iterative improvement. It allows data scientists to diagnose and rectify model biases or errors, ensuring the models perform as intended. Moreover, transparent models enable data scientists to communicate their work effectively to non-technical stakeholders, fostering collaboration and trust.

Customer Service Agents
Frontline staff interacting with policyholders can leverage explainability to provide clear, understandable explanations for decisions affecting customers, such as premium calculations. This transparency can enhance customer satisfaction and loyalty, as policyholders value insurers who offer clarity and straightforwardness.

End Customers
Policyholders are perhaps model explainability's most crucial beneficiaries. When customers understand how their data is used and how it impacts their premiums, they are more likely to perceive the insurer as trustworthy and fair. This understanding can increase customer engagement and retention and a positive brand image.

Striking the Right Balance between Explainability and Complexity

In recent years, the dynamics between black-box models and transparent models have sparked much debate within the insurance industry. While black-box models have been essential for learning complex relationships between inputs and outputs in data, they often lack straightforward explainability, which has become a critical need. As the industry faces increasing regulatory pressures, in addition to the underwriting risk and business-related costs, the demand for understanding how these models make their predictions grows, underlining the necessity of enhancing black-box model explainability.

Transparent models are praised for their inherent simplicity and fewer demands on computational power and data resources. However, the ability of black-box models to handle complex and nuanced datasets cannot be underestimated. As such, the real challenge for insurers is not merely choosing between complexity and transparency but balancing innovation with risk and ensuring that even the most complex models maintain a level of explainability that meets regulatory standards and builds consumer trust.

As the insurance industry continues integrating advanced AI, the focus on making black-box and transparent models explainable becomes paramount. This responds to regulatory demands and represents a significant step toward securing a competitive edge in the market.

Enjoyed this piece? Check out our blog on Why Explainability is Important in Insurance Pricing.

View full post