NHS Digital Data Release Register - reformatted

Wellcome Sanger Institute projects

4 data files in total were disseminated unsafely (information about files used safely is missing for TRE/"system access" projects).

A request for Pillar 2 testing data — DARS-NIC-411813-H0T2W

Opt outs honoured: Identifiable (Statutory exemption to flow confidential data without consent)

Legal basis: CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002

Purposes: (Research)

Sensitive: Sensitive

When:DSA runs 2020-11-02 — 2023-11-01

Access method: One-Off

Data-controller type: GENOME RESEARCH LIMITED

Sublicensing allowed: No


  1. Covid-19 UK Non-hospital Antigen Testing Results (pillar 2)


Genome Research Limited (also known as the Wellcome Sanger Institute) is a wholly owned subsidiary of the Wellcome Trust, funded by them to carry out genome sequencing to advance understanding of the biology of humans and pathogens in order to improve human health.

Genome Research Limited is a charity focused on doing research for societal benefit, and the Covid surveillance project falls within that mission. They will therefore not be seeking to commercially benefit from the project. While Genome Research Limited's reputation may be enhanced by its participation, as a charity that does not translate into commercial benefits.

Genome Research Limited is working to fully understand the transmission and evolution of the virus which requires sequencing and analysing viral genomes at scale and speed. The number of samples calls for a rapid and robust increase in the UK’s pathogen genome sequencing capacity.

To provide this increased capacity to collect, sequence and analyse the whole genomes of virus samples in the UK, (COG-UK) is pooling the knowledge and expertise in genomics of the four UK Public Health Agencies, multiple regional University hubs, and large sequencing centres such as the Wellcome Sanger Institute.

By sequencing thousands of viral genomes from around the UK each week, Genome Research Limited will help public health authorities detect and respond faster to super-spreading events, and monitor for new mutations associated with resistance to vaccines once they are deployed. Because the utility of these sequences is directly tied to how fast they can be produced, it is important to have immediate access to the basic data fields needed to pick them for sequencing from the hundreds of thousands of tests conducted each week.

By linking the viral DNA sequences to a subset of the Pillar 2 data Genome Research Limited will enable Public Health agencies to be able highlight geographic hotspots of various strains of the coronavirus.

Expected Benefits:

The viral genome sequencing and analysis will help in the post-winter peak phase of the pandemic to maintain as much control as possible for as long as possible by:
1. Detecting superspreading events via their very different genomic footprint than diffuse community transmission. Genome Research Limited are already working with the four Public Health Agencies and intend to provide the data feed to Test & Trace so they can find these events fast, and respond, for example, via enhanced contract tracing.
2. Monitoring for instances where vaccines are not effective. Vaccines will create selective pressure on the virus to mutate and evade them. Detecting these changes as early as possible will empower decision making on which vaccines to deploy and when.
3. Providing support to rapid outbreak response to specific queries from Public Health Agencies where sequences are deemed essential and informative to the epidemic response.


The viral sequences, sample dates and outer postcode are uploaded to the MRC-CLIMB database, which has separate agreements with the Public Health Agencies.

Wellcome Sanger Institute produces reports and a database which is available directly to the Public Health Agencies, and Test and Trace


As part of an agreement with Public Health England, Genome Research Limited receive the viral samples for sequencing and, under this agreement, will receive the relevant anonymised Pillar 2 testing data hourly.

Sanger will receive a data set from NHS Digital that contains a list of the positive COVID samples IDs and a subset of associated metadata. This will be de-identified by NHS Digital to an agreed level prior to it being received by Sanger. Sanger will download the dataset multiple times a day and apply algorithms to identify samples that are of interest and should be sequenced based upon a ranking system. A list of sample IDs will then be distributed to the required Sanger systems in order to enable the proposed use-cases.

The data will then be used to select a geographically representative set of samples (identified by the specimen ID) with sufficient viral material for sequencing each week. The sequences, dates and outer postcodes are linked and deposited in the MRC-CLIMB database and returned in reports to the Public Health Agencies.