Data quality emerges from Covid-19

In this e-guide: The public has been drenched in data during the Covid-19 pandemic. And the issue of the quality of that data has increasingly come to the fore.

Data quality is crucial to all business organisations, but its importance in healthcare has concentrated minds more widely, and the ways data has been managed and quality assured, or not, through the pandemic is illustrative for other sectors. For example, in this feature, the classic trade-off between fast data and best data is discussed, along with the need to explain the limitations of using the former, however necessary and “good enough” it might be for a particular business purpose.

And this article cites a global crisis survey by PwC, in 2019, showing the general importance of accurate data during crisis management. That survey found that three-quarters of those organisations who said they were in a better place following a crisis had come to strongly recognise the importance of establishing facts accurately during a crisis.

In respect of the Covid crisis, in a report published in January 2021, The House of Commons Science and Technology Committee found poor data management to have hampered the government’s Covid19 pandemic response, especially during the first wave of the pandemic. Data flows have been found to be getting better but, even so, the NHS was found, in the report, not to have the centralised data flows required.

The Office for National Statistics has become more prominent in public life during the crisis, and in December 2020 it published a Government data quality framework, providing five principles for data quality management in government.

Whether considering the public or private sectors, a proactive approach to data quality is advisable, otherwise the IT department and data professionals will find themselves fire-fighting. In this case study, we find trade show organiser Informa Markets turning to Collibra software to get ahead of the game in terms of the quality of its customer data. It was prompted to do so by a need to enhance its data quality efforts as it absorbed a major acquisition.

Finally, in this e-guide, the founder-CEO of a start-up supplier in the data quality space argues for the need to test data as the industry has tested software for quite some time.

 UK government coronavirus data flawed and misleading

On 30 April 2020, Boris Johnson told the country: “171,253 people have tested positive – that’s an increase of 6,032 cases since yesterday.” Having given a figure for those in hospital, from where he had recently returned following his own fight with Covid-19, the Prime Minister added: “And sadly, of those who tested positive for coronavirus, across all settings, 26,711 have now died. That’s an increase of 674 fatalities since yesterday across all settings.”

The government’s data on cases and deaths has been used by ministers and journalists as single sources of truth on the pandemic in the UK. But both are flawed and in some cases misleading, potentially distorting both public understanding and government decision-making.

The most recent government health data faux pas was a technical glitch that led to 15,841 positive results between 25 September and 2 October 2020 not being included in reported coronavirus cases. Public Health England says the fault was caused by a data file exceeding its maximum file transfer size.

Deaths might seem easy to measure, but deaths from a specific cause are not. The day before Johnson spoke in April, Public Health England had changed its figures to include those who died outside hospitals, such as in care homes, hence his repeated use of “across all settings”.

This would not be the last recalculation: the data covered anyone who had tested positive for Covid-19 and later died, regardless of cause, inflating the figures. When, in August, the organisation restricted the data to those who had died within 28 days of a positive test, in line with Scotland, Wales and Northern Ireland, more than 5,000 deaths were wiped from the overall death toll and Johnson’s 30 April figure fell to 634.

But that didn’t mean 634 died from Covid-19 on 30 April, rather that 634 deaths were reported that day. The figure tends to spike on Tuesdays, as NHS administration catches up from the weekend. Using the previous seven days’ data can smooth out weekly fluctuations but, more importantly, the data is out of focus, with one day of reporting including deaths scattered over previous days, weeks and even months.

More ways of measuring Covid cases

David Paton, professor of industrial economics at Nottingham University Business School, has regularly published figures based on the actual date of death in the English health service. He reckons the figures are “reasonably complete” only five or six days afterwards. By 6 May, NHS England had reported 281 deaths taking place on 30 April, 90% of Paton’s current total of 313 – which includes one report made as late as 20 September. The government now publishes its own UK-wide “date of death” data, reporting 548 deaths on 30 April.

Looking at when deaths actually occurred provides a different view of the pandemic’s spring peak. According to both the original and revised version of the government’s data, coronavirus death reports peaked on 21 April, a Tuesday. Both Paton’s and the government’s numbers based on date of death peaked almost a fortnight earlier, on Wednesday 8 April. However, the “date of report” daily figure remains the one most used, probably because it is immediately available and doesn’t change as more reports are made.

There are other credible ways to measure coronavirus deaths. By 18 September, the government’s all-UK death toll had reached 41,801. But the Office for National Statistics (ONS) reckons approximately 57,677 people had died by that date across the UK where death certificates mentioned Covid-19 – including 769 on 30 April – and that 53,663 more deaths had occurred in England and Wales alone in 2020 compared with the average of the previous five years, a measure known as excess deaths.

The government’s other main measure – positive tests for cases of coronavirus – has bigger problems. On 21 September, the official number of new daily cases exceeded April’s record and has continued to climb. But while the rise is a cause for concern, it should be seen in the context of daily test numbers having quadrupled since April. “The supply side has changed,” says Paton.

Case data on positive tests is a side-product of the testing system. It took time for the government to set up large-scale public testing; testing centres are likely to be established in areas where cases are growing; and in recent weeks demand has outstripped supply. The ONS is carrying out weekly research by sending tests to a random sample of the population, but this is smaller-scale, slower and lacks local detail.

To read full download the whitepaper:
Data quality emerges from Covid-19


Previous articleSecuring Hybrid IT is a Journey
Next article3 Reasons Why This Leading Tech Company Uses an Employer of Record