Can big data help tackle the next pandemic?

From tracking deaths to lockdown adherence, data analysis has been a vital weapon in the fight against Covid-19. Experts discuss where it could be heading.

Sign Up

Get the New Statesman's Morning Call email.

Data analysis has become a crucial tool in healthcare. From helping doctors spot cancer and sepsis early to calculating bed capacity in hospitals, interrogating large data sets can enable clinicians to provide more tailored patient care and make headway in understanding diseases overall.

Big data has also played a crucial and remarkable role in the fight against Covid-19. It helped scientists understand the virus’s protein structure, enabling them to develop treatments and vaccinations at pace; it allowed clinicians to better manage resources in intensive care units; it even helped researchers discover the optimum temperature for viral transmission.

But many of these applications were reactive rather than preventative, and localised rather than global. In future, could data collection be used to tackle and stop a pandemic before it takes hold?

Use of big data during Covid-19

Some tech in this area already exists – Canada-based AI start-up BlueDot, which scans social media for health trends, predicted there was a new virus days before the World Health Organization (WHO) released its first statement after spotting an unusual cluster of pneumonia cases in Wuhan. Physical tech has also been implemented; water management company Suez deployed a tracking system in Spain that scans waste water for virus traces, predicting where Covid-19 clusters would appear.

But the most prevalent uses of data science throughout the pandemic have been in understanding the virus, in terms of infection, recovery, diagnosis and treatment. The NHS' National Covid-19 Chest Imaging Database (NCCID) brought together more than 40,000 CT scans, MRIs and X-rays from 10,000 UK patients, enabling clinicians to spot lung patterns, diagnose patients quickly and predict whether they would end up in a critical condition.

On a larger scale, UK Biobank, a database of health data from 500,000 UK participants, has facilitated several studies, from immunity testing to predicting people’s risk of severe illness. Finnish company Nightingale Health used this database to analyse more than 100,000 blood samples, and discovered that people with a particular set of genes – a “molecular signature” – were five to ten times more likely to be hospitalised. The company developed a take-at-home blood test, which Finnish businesses gave to their employees to help them assess their risk and working patterns. The database was also used to analyse which pre-existing health conditions most increased the risk of death from Covid-19 – a study of nearly 2,500 individuals found that the most common comorbidity was hypertension, followed by cardiovascular disease and cancer. These findings have been used in hospitals to ascertain which patients might need more care.

“By understanding which preconditions facilitate rapid deterioration from Covid-19, we were able to simplify this into a score card, which could help staff on the ground understand who needed more attention,” says Aldo Faisal, director of the UK Research and Innovation (UKRI) Centre for Doctoral Training in AI for Healthcare at Imperial College London. “For instance, who might suffer from kidney failure and would need dialysis. This is a predictive application of AI.”

Shaping economic decisions

Data analysis has also been a useful tool in shaping economic decisions and assessing the public’s compliance with restrictions, says Olivier Thereaux, head of research and development at the Open Data Institute (ODI). “This is not necessarily AI, just solid data science,” he says. “Pretty simple data sets can be used to provide really helpful statistics for policymakers.”

One example is the City Intelligence Unit at the Greater London Authority (GLA), which worked with big tech companies including Google and Apple to collate 6,000 sets of mobility data. These were analysed to assess movement and adherence to lockdown. This analysis also helped researchers spot mass gatherings to predict where clusters might appear, and to understand which sectors had seen the biggest drop in footfall, informing decisions around economic re-opening.

Similar examples exist on a global scale. Google Trends, which tracks patterns in people’s internet searches, collated a Covid-19 trends page in 2020, demonstrating the most common queries in different countries, from symptoms to food banks. This provided valuable insight into health and economic impact and such tools could be used in future as a warning sign, says Simon Rogers, data editor at Google. “This whole area is rife for proper research but we could certainly see a rise in symptom searches before a rise in cases – it could provide an incredibly powerful signal.”

The future in healthcare

Some experts feel that more targeted data, rather than such mass consumer-generated data, could help policymakers make crucial decisions. “If we can gather data from healthcare, transport and financial companies, we could then run AI algorithms, which could find the best policy to minimise deaths,” says Faisal. “This could help policymakers better balance health and economy.”

Others feel global collaboration is the key to tackling the next crisis. Samira Asma, assistant director-general for data, analytics and delivery at the WHO, says accurate reporting has been an issue throughout Covid-19. The WHO has calculated death rate from the start but has found that continents such as Africa significantly under-report. The organisation estimates there were more than three million Covid-19-related deaths in 2020 compared to just 1.8 million reported. There needs to be a renewed focus on health, like there is for climate, Asma says. “Like the World Meteorological Organisation (WMO), we need a global enterprise – a shared resource for data collection,” she says. “Similarly to how we can forecast weather or natural disasters such as tsunamis via satellite tracking, we should have a monitoring system for pandemics and epidemics.”

The WHO, meanwhile, is undertaking a project that looks to improve international data collection. It has developed the World Health Data Hub, a digital portal where countries can securely deposit anonymised health data about their citizens. This will eventually become a searchable database with country-by-country infographics and reports. The organisation is also pooling wider socio-economic data into what it calls a “data lake”, such as information on the healthcare workforce and access to essential services and medicines. Due to launch in 2022, this hub could be vital in highlighting health disparities and enabling a faster, fairer response to future emergencies.

“The aim is to make timely, reliable data accessible for countries in a desegregated fashion,” Asma says. “Having data on a national level is not enough – equity is so important.” Additionally, the WHO is developing a Health Data Partnership, comprised of 60 global organisations, which will train local workforces on effective data capture.

Instilling data literacy will be essential, says Thereaux, regardless of how intelligent tech becomes. There is no use having a truly predictive system without the individuals to interpret the findings. “You need medical, epidemiological and data science experts working alongside each other to run these predictive systems,” he says. “You can’t just build a system and expect it to detect a future pandemic. It’s about training data scientists, but also decision-makers, in what data can do.”

Global collaboration

Another priority is developing global standards, to ensure data is treated ethically and is fully anonymised. Thereaux suggests that the creation of data institutions would help to ensure it is used for the public good. “Institutionalising the stewardship of data would give companies the will to share data for research and development of new medicines,” he says. “It would mean no one loses out commercially and smaller organisations do not risk their work being absorbed by big tech, creating a much healthier ecosystem of collaboration and innovation.”

Faisal adds that “federated learning” could be explored – an AI “learns” from someone’s data without the need to collect or store it in the cloud. “The model travels rather than the data,” he says. “A predictive model trains locally on an individual’s phone, then travels onwards with what it has learned about the disease. This travelling model can visit millions of phones, constantly improving – you never see individuals’ private information.”

Tools like this could be used to learn about a virus as it develops – for instance, into variants – and could be coupled with mobility data and physical tools like sewage scanners to create targeted interventions. An area’s waterways might show an increase in virus particles; this combined with footfall data could indicate that specific streets need to be locked down rather than a whole city.

For many experts, predicting a pandemic is not the be-all and end-all – rather, it’s about using technological advancement to better track a pandemic’s path and tailor a response that reduces both loss of life and loss of livelihood. “The question is – is it all about prediction or is it about better disease control?” says Faisal. “Being able to predict the next pandemic would be great but I’m not sure what the level of intervention needed to stop it would be. I think smarter sensing of disease that enables us to manage the response would be better and would allow society to be much more flexible in future.”

Sarah is a Special Projects Writer at the New Statesman.

Free trial CSS