In light of the recent letter from the ONS to the Health Secretary, Richard Kelsey gives his thoughts on the reliability of coronavirus datasets and how it affects actuaries’ work.

In measuring the spread and effect of the virus, there are three key statistics. In order of the infection timeline, these are discussed below:

1. Time series of number of positive tests

The changing of the criteria to testing has been well documented i.e. reduction of testing scope in the early days of the outbreak due to capacity constraints; and the increasing of testing as the number of deaths decreases, so clearly the infection rate had already reduced a number of days previously. This data is not a consistent measure. However, with a more granular analysis and understanding of the data, it may be useful.

The number of false positives and false negatives could be an issue. As long as the extent of testing is consistent (such as on suspected cases) then this may be a constant proportion and not a misleading feature. However, if testing is broadened into the wider population then these types of error can be expected to be more material.

The general issues with data on the number of tests is set out here in a letter from the UK Statistics Authority to the Health Secretary [1];

2. Time series of hospital admissions to COVID-19

Public Health England publishes a Weekly Coronavirus Disease 2019 (COVID-19) Surveillance Report Summary of COVID-19 surveillance systems [2]. If we look at “Figure 21: Overall daily hospital and ICU/HDU admission rates per 100,000 of new COVID-19 positive cases reported through CHESS, England”, in the document the supporting text states the following:

The CHESS surveillance system monitors daily new acute respiratory infections (ARI) and new laboratory confirmed COVID-19 admissions to hospital including critical care (ICU/HDU). Trends in hospital and critical care admission rates need to be interpreted in the context of testing recommendations.

A total of 134 NHS Trusts are now participating, although the number of Trusts reporting varies by day. The daily rate of new admissions of COVID-19 cases is based on the trust catchment population of those NHS Trusts who made a new return each day. This may differ from other published figures such as the total number of people currently in hospital with COVID-19.”

We see increasing admissions after the peak death count of 8 April. This could be due to a variety of factors:

  • this is not a consistent dataset;
  • those admitted were “less ill”;
  • treatments have got better as more is understood about the type of care which works;
  • the virus is less deadly;
  • Hospitals have more beds available.

One of the problems with open data is the full detail is not available to perform data checking and data validation.

3. Time series of daily deaths

Measuring death counts by date of reporting such as the daily briefing of the standard ONS reporting is clearly misleading. It is affected by working patterns, delays to reporting and speeding up of the reporting pattern. These effects make reading trends difficult.

ONS data (to which you need to add NRS for Scotland and NISRA for Northern Ireland) report by date of death after a week’s delay and a further 4-day report collation delay (Tuesday after the week ending Friday).

NHS England produces daily death counts (reporting day) by date of death and manual collation of these shows a two dimensional data set of count by date of death by reporting day since 1 April. Peak deaths occurred on 8 April. Third dimensions such as region, age and NHS hospital trust are also available. The core consistent dataset is “deaths in hospital with a positive COVID-19 test”. Deaths related to COVID-19 without a test have been collated since 28 April.

What have we seen? Measurement of Deaths by Date of Death – Epilogue

Clearly, deaths lag infections. Also, deaths can be thought of as a function of delayed moving average infection count. Data quality is too poor to infer the relationship. However, there is the story of Coronavirus Deaths in NHS England Hospitals with a positive test. Here we look at projected ultimate counts by date of death to 31 May 2020 calculated as at 2 June 2020. Data is shown on actual and logarithm scale. Four phases of COVID-19 are identified:

Phase

Dates

Comment

1.Attack

11 March to

27 March

11 March is the first date of death with more than 10 Coronavirus deaths (after positive test). The gradient reduces after 27 March. The log gradient is equal to growth of 24% per day, or doubling every 3 days.

2.Slower

Growth

27 March to

5 April

10% growth in deaths per day. Doubling every 7.5 days. Although lockdown was 23 March, the advice to work from home was given on 17 March and commuter trains were very quiet that week. So were infections slowing before lockdown due to softer measures being effective?

3.Plateau

& peak

5 April

to

11 April

The peak stands out as 8 April, but there is a plateauing before and after which can be seen in the chart.

4.Decline

8 April

Onwards

Decreasing at 5% per day compound, or halving the daily death count every 15 days. At this rate, for this data source, deaths will be in single figures by 15 July. Why is the decline in death rate much slower than the earlier growth rate? Is it being slowed by cases in ICUs taking a longer time to die?

 

Null

Download the 'NHS England - Covid-19 Deaths in Hospital after Positive Test' chart
[3] Source: The data used in the above model was taken from NHS England Covid-19 Daily Deaths.