‘Coronavirus case numbers are meaningless. Unless you know something about testing. And even then, it gets complicated.’ So wrote Nate Silver, statistician and founder of FiveThirtyEight, almost exactly a year ago. But do we know enough about testing?
Coronavirus Case Counts Are Meaningless*
*Unless you know something about testing. And even then, it gets complicated.
In October, we explained in the Unstatistic of the Month why the 7-day incidence, which shows the development of new infections, is only of very limited use as a control parameter. Nevertheless, measures such as curfews, person limits at events and alcohol bans still depend on it.
Nevertheless, the usefulness of this indicator is increasingly being questioned. ‘Where there is a lot of testing, there are more cases. In this respect, we also need a relation to the number of tests carried out, which is still missing! Otherwise, this would motivate fewer tests to keep the incidence in check.’ In mid-March, Ebersberg District Administrator Robert Niedergesäß criticised the political focus on the 7-day incidence rate.
A high 7-day incidence indicates that many people have been infected with the virus. Some conclude from this that the healthcare system will be overwhelmed with some delay and not all patients can be treated, which can result in numerous deaths. However, the 7-day incidence alone does not provide a view of the overall situation.
Reported new infections and all parameters derived from them should always be set in relation to other figures.
Data-competent decision-making in a pandemic
In order to develop an intuition for how strongly the estimate of the reproduction number R and the estimate of the 7-day incidence depend on the quality of the tests, the actual prevalence of corona and the number of tests, my colleague Stefan Linner has programmed an interactive app (STAT-UP Corona Test and Infection Dynamics). This will be part of the online course ‘Data-informed Decision Making in a Pandemic’, which we are currently developing together with scientists from the Federation of European National Statistical Societies (FENStatS) and the AI Campus. Decision-makers in politics and business, as well as data journalists and students, should use this course to develop a better understanding of how data and statistics can be useful in coping with a pandemic.
Source: https://corona.statup.solutions/ (screenshot)
Of course, the app makes many simplifications for didactic purposes. For example, it does not take into account the fact that fewer and fewer people can be infected over time. However, with a relatively short modelling period of ten weeks and a low prevalence, this simplification is unproblematic. In addition, the model is based on the assumption that the tests are representative, i.e. in particular that the probability of obtaining a positive result is independent of the number of tests.
In the next version of the app, we will include a parameter for non-representative testing, which reflects reality much better.
However, one thing is more than clear: the estimate of the 7-day incidence is considerably distorted by changes in testing strategies.
Some critics of the 7-day incidence rate are now suggesting that the positive tests should be set in relation to the total number of tests, i.e. looking at the proportion of those who test positive or the change in this proportion. Unfortunately, it's not that simple.
Covid-incidence calculated incorrectly? Maths student accuses RKI of ‘mistakes’ - and is improved himself
A maths student accuses the RKI of calculating the corona incidence incorrectly. But his solution also appears to contain errors.
It can be assumed that as the number of tests increases, the probability of obtaining positive results among those additionally tested decreases. This is because if there are few tests available or there is no incentive to get tested ‘just like that’, for example because the Christmas visit to the parents is coming up or the holiday season is starting, it is mainly people who have had contact with infected people and/or show symptoms who get tested. However, the probability of a positive result is higher there than among asymptomatic people.
From Data to information
But what alternatives are there? How could more information be obtained from the data? I would like to briefly outline five possibilities - and first of all: none of them is the silver bullet. Rather, they represent individual pieces of the puzzle that, when put together, can provide a better picture of the pandemic.
Option 1: Positives rate and additional information
If we knew what the positive rate of the current marginal test would be, we could draw much more robust conclusions. If we were to repeat the current test week (purely hypothetically) but only carry out one more test, this would be the marginal test.
To do this, the tests would have to be divided into priority classes in the sense that the people tested are at least roughly categorised according to the probability of an actual infection. This prioritisation should be at least qualitative, for example by classifying those tested according to the degree of their symptoms. It is actually surprising that this still does not happen or at least - if it does - that the relevant data is not freely accessible.
Option 2: Representative samples
The clean solution would be what I have been calling for over a year: Representative samples. It is clear that it would not be easy to carry this out, but in view of all the other pressures on our society - economic, psychosocial and many more - this approach should have been pushed much harder.
According to the RKI, real testing capacity has averaged 2.3 million tests per week since the beginning of the year. However, only half as many tests are regularly carried out. 17 million people have already been vaccinated at least once, 3 million have already been infected in the sense of testing positive. In particular, if additional tests were pooled, i.e. samples from ten people tested were analysed together and individual tests were only carried out in the event of a positive result, it would be possible to test a fifth of the population in just one week using only the unused capacity in order to obtain reliable data. Of course, this is not possible every week, but why not at least once a year?
Option 3: Relation of case numbers and delayed hospital admissions
If the criteria for hospitalisation do not change too much, which is not guaranteed if there is a shift in the age of those infected, then the daily new admissions should reflect the actual incidence of infection relatively well. The time lag to the infections is not too great, but should be in the region of one week. The figures shown relate to intensive care patients, i.e. the time lag is likely to be somewhat greater here. In addition, the inflows should be shown, not the number of patients. This is because, just as the courses of infections caused by mutants overlap, discharges from a previous wave can also mask new admissions in the current wave.
Unfortunately, in Germany the new admissions are not shown in the machine-readable tables of the intensive care register, but only in the free-text PDFs of the RKI, where they have to be laboriously extracted, as the position of the information within the document varies over time. But even if access to the data were easier, this solution is not perfect. Better treatment methods can reduce the number of intensive care patients and the number depends heavily on the age of those infected, i.e. outbreaks in schools should not be noticeable in new admissions.
Option 4: Relation of case numbers and delayed deaths
The reported new infections could be compared with the death data in a similar way. Deaths are the gold standard for recording a disease. However, mortality is decreasing and the time to death can change. The reasons for this are the same as those already mentioned: Better treatment options and a change in the age structure of those infected. A simple comparison over longer periods of time is therefore also problematic here.
Option 5: Alternative data sources
Alternative indicators would also be monitored here. The Central Bureau of Statistics of the Netherlands (CBS), for example, documents the number of SARS-Cov2 virus particles in wastewater in its coronavirus dashboard. When people are infected with the coronavirus, there is a certain probability of virus particles in their stool - studies say that this is the case for 40 per cent or more of those infected. These particles are flushed down the toilet and end up in the wastewater. By analysing wastewater samples collected at sewage treatment plants, it is possible to obtain information on how widespread the virus is in a particular region.
Coronavirus Dashboard | COVID-19 | Government.nl
Information on the development of coronavirus in the Netherlands.
A very comprehensive proposal for pandemic monitoring in Australia documents how widespread this wastewater monitoring is already internationally: ‘The wastewater test is about to be ready for use in Australia. It has already been used successfully in Singapore and in some US locations, and there are concrete plans to use the system in every wastewater treatment plant in the Netherlands. There are also plans to implement this monitoring in the UK following a recent sampling study, as well as in South Africa. The New Zealand Chief Medical Officer considers it part of the corona monitoring system.’
A comparison of the Dutch curves for the 7-day incidence (reported cases per 100,000 inhabitants) and for the virus particles, also standardised to 100,000 inhabitants, is revealing. The peak of the last five weeks was reached on 27 (incidence) and 28 March (virus particles). In the previous three weeks, the virus particles rose from 219.24 to 263.34, i.e. by 20 per cent. The incidence, on the other hand, increased from 30.5 to 50.6 in the three weeks of comparison, which corresponds to 67 per cent - a more than three-fold increase.
One thing is obvious: the curve of virus particles in wastewater is not dependent on the number of tests carried out; there are no week-to-week fluctuations.
Source: https://coronadashboard.government.nl/ (screenshot)
However, this comparison also illustrates why it would not be wise to use only the change in the positive test rate as a benchmark. Between 6/7 and 27/28 March, the positive rate rose from 8.1% to 9% and 8.3% respectively. That is an increase of between two and eleven per cent.
Information in context
If significantly more people are tested and the spread of the infection in the population remains constant, the positive rate must fall - because proportionately fewer infected people are to be expected among the additional people tested. A constant or rising positive rate with an increasing number of tests indicates that the true incidence is increasing disproportionately. This is supported by the increase in measured virus particles.
Testing more enables a better understanding of the pandemic. The incidence becomes more meaningful if the number of tests is increased, as the number of unreported cases is reduced. However, comparability between districts is then no longer possible if the number of tests varies and over time. In addition, fixed limits such as 35, 50 or 100 are nothing more than numbers with no real significance for the severity of the infection.
More tests are better. But you have to categorise the results correctly and not lose your nerve when case numbers and incidences skyrocket as a result.
Case numbers and incidences must be considered in context - for example, hospital utilisation, mortality or innovative indicators such as the number of virus particles in wastewater. These indicators, unlike those based on reported cases, cannot be so easily influenced by more or less arbitrary strategies. A data-competent approach to a crisis can only be achieved if decision-makers fully understand the possibilities and limitations of each indicator.
I hope we are not as far away from this as it seems.