In this article Matthew Edwards looks at risk factors in the context of mortality: what they are, how they help, and how we can misunderstand them. The use of combinations of mortality risk factors (rather than the application of single risk factors in isolation) has become increasingly common in life and non-life insurance, and many mortality studies in respect of pension schemes also make use of these techniques.
What are risk factors?
Actuaries generally use the term ‘risk factor’ to denote something measurable with a material impact on the risk in question (for instance, age and mortality). Risk factors often involve a degree of simplification to be practical and measurable. For example, age is a useful ‘proxy’ for general physiological degeneration, which in practice has many complex genetic and lifestyle drivers that are hard to measure.
What risk factor effects do we typically observe in life, pensions and non-life insurance?
For mortality (of great relevance to life insurance and pensions), the two most commonly used risk factors are age and sex. Medical history, socio-economic status and lifestyle (smoking in particular) are highly predictive of mortality, and are generally used where possible. There are other factors that are often predictive that are rarely used, for reasons of fairness noted below – for instance, genetics and ethnicity.
In non-life insurance, the risk factors of relevance depend of course on the type of risk covered. Thus, for car insurance, the age of the car and engine capacity (or related vehicular characteristics) are likely to be strongly predictive of claims; for household insurance, the location and the age of the house would be predictive.
How do risk factors help with the ideas of customer fairness?
There is a great debate regarding fairness as to whether we mean ‘fair for individuals’ or ‘fair for a group’. Defining an appropriate group also poses issues: should it reflect a whole population of country, or perhaps subsets with common risk drivers, say a pension scheme.
If we wish to be fair in the sense of fair to individuals, then using the relevant risk factors helps to maximise fairness – thus, high-risk individuals will pay more for their insurance than low-risk individuals.
If we wish to be fair across a group then some risk factors need to be avoided. It may be the case that we wish to be fair in this ‘group sense’ in respect of some characteristics such as ethnicity, so as not to discriminate racially, but to retain fairness in the ‘individual sense’ in other regards, such as age, or the value of your car (so that sports car drivers aren’t subsidised by the drivers of old family cars).
One concept that has a lot of influence on whether a risk factor’s use is fair or not, in addition to legal constraints and general social mores, is whether an individual is ‘in control’ of the aspect in question. Thus, the use of genetic testing may be regarded as unfair partly because you can’t change your DNA, while smoking seems a fair risk factor because individuals choose whether or not to smoke.
Where can we go wrong with risk factors?
While all the above may seem fairly simple and obvious, ‘the devil is in the detail’: the principles are clear but quantifying the risk factors accurately is not easy.
The main problem encountered is ‘data confounding’. Simple analysis may make a particular risk factor look very influential. However, the reality may be that the effect arises because of the influence of another factor strongly correlated with that first factor. This is best demonstrated using a simple example.
If we look at car insurance claims regarding the age of the vehicle, we might see a high cost associated with older vehicles. But is the age of the vehicle really associated with higher claim costs?
What is happening is that younger drivers tend to drive older cars, which are cheaper to buy, so we have a material correlation between the age of the driver and the age of the car. Young drivers generally have much worse claims experience than older drivers, so if we look at car age by itself (ignoring the age of the driver), we associate the car’s age with high claim costs. If we look at both factors at the same time using proper methods, we will see that the age of the car is associated with decreasing claim cost (leaving vintage cars out of the equation!)
It is possible to have confounding even in very ‘obvious’ contexts – for instance, age and sex. If we look at the mortality of octogenarians by sex only, disregarding age, female mortality would look similar to, perhaps even greater than, male mortality. But this effect derives from there being far more old women in that group than old men (because women have lighter mortality than men, and hence many more have survived to high age). If we allowed for age and sex, we would see that female mortality in that age group was much lighter than male mortality.
The problem of confounding results from uneven correlations in the data (in the first example above, the correlation between driver age and vehicle age). To avoid this distortion, we can use more complex analytic methods such as multivariate analysis. However, these have two significant disadvantages:
- They require much more data (for instance, a good multivariate analysis typically requires of the order of several thousand of the ‘events’ in question – eg policyholder deaths – and we might not have enough);
- They are more complex (both as regards doing the analysis, where special software and user expertise are both necessary, but also more complex as regards understanding and validating the results).
Risk factors and COVID-19?
Views so far regarding COVID-19 risk factors have been (as at end April 2020) almost entirely adduced through simple analyses that are prone to data confounding.
It is clear that age and sex are strong risk factors (higher risk with increasing age, and higher risk for men than for women). However, many of the other observations that have been made regarding COVID-19 risk factors have been misleading because they relate to factors strongly correlated with age.
For instance, there has been much comment regarding hypertension and chronic medical conditions being highly connected with COVID-19 mortality. Viewed in isolation these would appear significant factors, although we also know these conditions are strongly associated with age.
The small amount of multivariate analysis published so far on COVID-19 outcomes shows that hypertension is not a predictive factor when we account for age correlation. Similarly, many chronic conditions have much less effect than would seem when looked at in isolation (ie when looked at independently from the age effect).
Another, and simpler, example of how COVID-19 statistics have been misunderstood for similar reasons relates to the comparison of deaths (or cases) between countries. We obviously need to adjust for population size to ensure some comparability. However, given that COVID-19 incidence (for cases) and mortality (for deaths) are both heavily age dependent, we would expect, all other things being equal, that a country with higher average age would have more deaths. Thus, part of the reason for Italy’s high case count and death count has been its high average age compared with other European countries.
The pandemic is an example of how risk factors can operate in counter-intuitive ways. Simple analyses need to be ‘handled with care’ to guard against data confounding. More complex analyses are now emerging that provide us with important insights into both the morbidity of COVID-19 (who is more likely to need hospitalisation?) and its mortality (who is more likely to die or to survive).