Louise Pryor, incoming President-elect at the Institute and Faculty of Actuaries, discusses the use of models and their limitations.

Louise PryorThere’s a lot of talk about models at the moment. Because COVID-19 is so new, we don’t have much else to go on – no actual experience, just some projections of what might happen. So it’s worth thinking about how much reliance we should place on models and what their limitations are.

The first thing to remember is that models are inevitably simplifications of the real world. They’re a bit like maps: a map that is the same size as the real world is effectively useless, unless it covers a very small area. (Even if it covers a very small area it wouldn't be useful, eg a plan of a house that's the size of the house.)

Similarly, the value of a model for many problems is that you don’t have to wait for things to happen in the real world. You can run a computational model in less than real time, and see what might happen under various circumstances. For example, you can try lots of different scenarios and find out what possible disasters are lurking in them and how likely they are to happen. Or at least you can if your model is close enough to the real world.

But there’s the crux: how can we tell if a given model is close enough to the real world? Here are five questions that you can ask to help you decide whether the model’s outputs are likely to be useful to you.

1. What purpose was the model developed for?

Models are simplifications of the real world, but there are many different ways of simplifying things. Different simplifications are suitable for different purposes. So if you are looking at the outputs of a model, you need to think about whether the purpose it was developed for is aligned with your own purpose. For example, disease transmission models are currently featuring heavily in the news. But there are many such models, sharing some features but varying widely in their details. One model might be specifically tuned for use in projecting total population infection rates, so it might look at a country as a whole, while another might be intended to analyse the burden on health services and therefore focus on regional differences in health provision. Although these two matters are clearly linked, they are not the same, as the burden on health services will depend crucially on the severity and extent of the disease.

If you can’t tell what purpose the model was developed for, you can’t tell whether its results are applicable to your situation.

2. What assumptions does the model depend on?

Models use assumptions about the world in order to make simplifications. It’s often difficult to find out what all the assumptions are, even though the model outputs are often very sensitive to them. Sometimes there are many implicit assumptions, which the people building the models don’t even realise are there. For example, if the assumptions are based on data analysis, is the data representative of the situation to which the model is being applied? If it’s being used for projections for the whole population, assumptions based on data for healthy adult males may not be appropriate.

It’s also important not to confuse assumptions with outputs. An assumption is something its builders are telling the model about the world, not something the model is telling us.

3. Do the model results represent a best estimate or something else?

One thing we know about models is that their results are very unlikely to match precisely what actually happens. In that sense, all models are wrong. But their results can be useful, if you understand what they mean. Some models produce a best estimate outcome, or an average outcome, while others might have some prudence built in. In many situations, the best estimate isn’t very useful on its own, without understanding something about what the possible range of outcomes is. For example, the best estimate result is 50 in both these illustrations, but in the second case it’s very unlikely that the actual outcome is 50 – it is much more likely to be either 25 or 75.

Illustration one

Null

Illustration two

null

4. Are the assumptions or outputs at the extremes of possibilities?

Models are often built to operate well in particular circumstances. If they are used in other circumstances, their results are much less reliable. For example, an economic model may use historical data about yearly inflation rates, which have been between 0% and 10% in the period that the data covers. If it is then used to model an economy in which yearly inflation is currently 20% some of the relationships or assumptions which it relies on may be invalid. Typically, models operate best towards the centre of their ranges – anything on the edge is much less reliable.

5. How has the model been validated?

However sophisticated a model is, it is still a simplification, and relies on many assumptions that may or may not be valid. If it has been tested using historic data, and produces results close to what actually happened, it’s easier to have confidence that it might be useful in the future. The same applies if it produces results close to those of other, trusted, models in areas in which they overlap. But if there has been no validation, and the model relies only on unproved theories, there is more scope for its results to be very different from what happens in the real world.

Finally, it’s always worth comparing the results of a number of models. There is some comfort to be derived if they produce results that are consistent with each other, even though they take different approaches to the problem at hand. In relation to COVID-19, there do seem to be plenty of models out there, even if there is little historic data to use. Each model may get some of the details wrong, but the overall picture may become clearer. As in so many areas of life, diversity in modelling is generally a good thing.