NHS Digital Data Release Register - reformatted

Genomics England projects

Genomics England: Generations Study cohort data for use in the National Genomics Research Library (NGRL)
Curated NCRAS data for the The National Genomics Research Library (ODR1617_131)
Genomics England (MR1418) - Amendment and Updated Request for tranche of data across multiple data sets.
R26 - GENOMICS ENGLAND: GenOMICC COVID-19 Study

2552 data files in total were disseminated unsafely (information about files used safely is missing for TRE/"system access" projects).

🚩 Genomics England was sent multiple files from the same dataset, in the same month, both with optouts respected and with optouts ignored. Genomics England may not have compared the two files, but the identifiers are consistent between datasets, and outside of a good TRE NHS Digital can not know what recipients actually do.

Genomics England: Generations Study cohort data for use in the National Genomics Research Library (NGRL) — DARS-NIC-733503-V0X9Q

Opt outs honoured: No (Excuses: Consent (Reasonable Expectation))

Legal basis: Consent (Reasonable Expectation); Health and Social Care Act 2012 s261(2)(c)

Purposes: Yes (Research)

Sensitive: Sensitive, and Non-Sensitive

When:DSA runs 2024-04 – 2027-04 2024.10 — 2025.11.

Access method: Ongoing

Data-controller type: GENOMICS ENGLAND

Sublicensing allowed: Yes

AGD/predecessor discussions: AGD minutes - 11th January 2024 final.pdf

Datasets:

Civil Registrations of Death
Community Services Data Set (CSDS)
Diagnostic Imaging Data Set (DID)
Emergency Care Data Set (ECDS)
Hospital Episode Statistics Admitted Patient Care (HES APC)
Hospital Episode Statistics Critical Care (HES Critical Care)
Hospital Episode Statistics Outpatients (HES OP)
Maternity Services Data Set (MSDS) v2

Type of data: Identifiable

Objectives:

Genomics England requires access to NHS England Data for use in their National Genomics Research Library (NGRL).

The NGRL is a secure national resource of genomic, health and sample data managed by Genomics England, which builds on the research environment created by Genomics England for the 100,000 Genomes Project that completed recruitment in 2018 and involved the sequencing of approximately 100,000 genomes. It contains cohorts of patients/participants recruited via programmes set up for the NHS Genomic Medicine Service (GMS), the 100,000 Genomes Project, and other studies/programmes where patients/participants genomes have been sequenced, and provides a national standardised genomic research resource. Being able to compare all patient data in one place provides researchers with an opportunity to better understand diseases, develop new treatments and can lead to new discoveries.

Under this Data Sharing Agreement (DSA), Genomics England will receive NHS England Data linked to the Generations Study cohort and that Data will be accessed via the NGRL.

On average, nine babies in the UK are born each day with a rare genetic condition that could be treated, prevented, or even cured if only it had been diagnosed when those babies were newborns. The Generation Study is aiming to find out if this situation can be improved by recruiting 100,000 newborn babies at NHS Trusts throughout England and conducting whole genome sequencing (WGS) to screen for over 200 rare genetic conditions that are treatable in early childhood. It is hoped that babies affected by these conditions will be identified more quickly, treated earlier and therefore have improved clinical outcomes.

The 3 research questions that Genomics England hope to answer are:
1. Can we better diagnose, and therefore care for, children with rare diseases?
2. Can we give researchers opportunities to improve their understanding of rare disease, to develop new treatments, and diagnoses, and better understand how our genes affect our health?
3. Should we, and if so how should we, use a babys genome throughout their lifetime as a resource they and their doctors can use if, for example, they become ill as they get older?

Processing for the Generation Study involves but is not limited to:
> Identifying discrepancies between WGS interpretation results and Bloodspot data results from the CSDS Dataset. This will be conducted on AWS cloud via an automated system developed by Genomics England. Discrepancies will be reported in the cases interpretation portal to be made available to the treating clinician.
> Evaluation of the cost effectiveness, health related outcomes, demographics monitoring and impact on the NHS. Further details of the outputs are detailed below.

The following NHS England Data will be accessed:

> Hospital Episode Statistics (HES) - these Datasets provide the core clinical data for participants and are vital to the provision of a detailed longitudinal medical history for participants. Specifically, the following HES Datasets are required:
- Admitted Patient Care (APC)
- Critical Care (CC)
- Outpatients (OP)
> Emergency Care Data Set (ECDS) necessary to understand which patients attend emergency departments and what treatment they receive in order to assess if there are associations with genetic markers.
> Diagnostic Imaging Dataset (DID) necessary to provide invaluable, detailed information to build on participants phenotypes (observable characteristics), e.g., tumour size and spread in cancer, adding to the understanding of patients histories on individual and cohort level and their relationship with genomic alterations.
> Civil Registration Mortality necessary because deaths Data is essential for performing survival analyses; this is crucial information for research in combination with other medical history. Knowledge of participant death is also vital for the correct analysis of medical timeline data and for the management of participant cohorts.
> Community Services Data Set (CSDS) necessary to evaluate the cost effectiveness of the Generation Study, to estimate the impact of WGS in newborns, and identify discrepancies in diseases identified between WGS interpretation results and the CSDS Dataset, which will be reported in the interpretation portal to be made available to the treating clinician.
>Maternity Services Data Set (MSDS) necessary for discovery research purposes.

NHS England Data is matched to the consented cohorts in the NGRL and therefore provides a more comprehensive medical history, and going forward, a more comprehensive patient journey. NHS England is the richest source of the data required.

The evaluation of WGS data in the context of rich and extended phenotypes derived from electronic health records, such as blood pressure, cholesterol, glucose, and pharmacogenomics (a field of research that studies how a person's genes affect how he or she responds to medications.), adds significant value. The richness of the NGRL datasets will allow Genomics England to move beyond the primary phenotype of the rare disease, cancer or infectious disease that led to the patients initial WGS in the context of other continuous traits, diseases and response to therapy including harm.

The level of the Data will be:
> Identifiable
For the following Datasets: HES APC; HES OP; ECDS; Civil Registrations of Death, many indirect identifiable Data items are required because they provide valuable Data that can help researchers make new scientific and medical discoveries. All directly identifiable Data items will either be removed or transformed according to best practice agreed with NHS England. To ensure patients will not be identified the researchers and their projects are examined before access is granted to the Data, and an agreement to not re-identify patients is signed. Further Genomics England monitors all Data leaving the NGRL and will not allow patient records to be exported.

The Data will be minimised as follows:
> Limited to a study cohort identified by Genomics England, comprising of patients recruited via the Generations Project, which follows newborn babies (expected ~50,000 new additions per year - recruitment is due to begin in December 2023 and will continue until 100,000 participants have been recruited in 2025).

Genomics England is the controller as the organisation responsible for ensuring that the Data will only be processed for the purpose described above.

The lawful basis for processing personal data under the UK GDPR is:
> Article 6(1)(f) - processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party.

Genomics England has determined the processing is necessary for its legitimate interests in carrying out medical research on the causes, diagnosis and treatment of rare diseases and cancers.

The lawful basis for processing special category data under the UK GDPR is:
> Article 9(2)(j) - processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject.

It is necessary for Genomics England to process special category participant data for carrying out medical research on the causes, diagnosis and treatment of rare diseases and cancers, which is expected to benefit patients.

The funding is provided by the Department of Health and Social Care. The funding is specifically for the projects described. Funding is in place until March 2025, with the intention to renew this funding periodically.

Lifebit provides IT support to Genomics England.

Amazon Web Services (AWS) provides IT back up services to Genomics England and will store copies of the Data as contracted by Genomics England.

Representatives from patient and public bodies have an important role to play in Genomics England commercial initiatives. These representatives ensure transparency is upheld, and the interest of those whose data is being used is always being respected.

In the early stages of the Library, Genomics England undertook a range of work to ensure that potential participants views were included in the formulation of the ethical policies submitted for research ethics approval and in the development of patient information. The views of different groups of potential participants (those affected by cancer, rare disease, and those from BAME communities) in relation to ethical issues raised by the 100,000 Genomes Project were sought and findings were published on the Genomics England website (See all reports under patient and public involvement - https://www.genomicsengland.co.uk/library-and-resources/ and the Genomics England Engagement Strategy). Genomics England will continue to engage with these stakeholders. Further to this, each of the 13 currently recruiting NHS Genomic Medicine Centres had dedicated Patient and Public Involvement leads (PPI) who are responsible for engaging with and involving local potential participant groups from diverse backgrounds. It is expected that the future NHS GMS will continue these local PPI activities to shape and inform the service.

A Participant Panel has also been established. This 30-strong group has provided invaluable advice on a range of topics, for instance, in shaping how analysis is monitored, how results are returned, and how advice and support should be framed. Participant Panel members have either donated samples to the Library themselves or are carers of participants. They take part in a wide variety of consultative groups, such as the Genomics England Ethics Advisory Committee but most importantly are guardians of the dataset, with representatives on the Access Review Committee. Participants play an important part in every decision made about access to data.

SUB-LICENCING:
Genomics Clinical Interpretation Partners (GeCIP) members (Academic research organisations), and members of the Discovery Forum (Commercial organisations) will also have access to the pseudonymised Data within the NGRL, subject to internal approval by Genomics England. NHS England Data is combined with the genomic and sample data within the NGRL, providing a more comprehensive medical history, and going forward, a more comprehensive patient journey which will be a valuable resource for medical research. All applications have to provide health and social care benefits and are reviewed by a panel (the Access Review Committee (ARC)) before access is granted.

It is anticipated that the volume of sub-licences will be 150-200 per year. The GeCIP sub licence agreement is indefinite, until it is terminated by either the GeCIP member or Genomics England.
The Data Access Agreement for Discovery Forum Members has a specified term, normally 12 months, at which point the company and Genomics England can choose to renew or not.

All requests for data access will be subject to the following considerations:
Protection of data subjects (honouring commitments made to them, acting within the scope of consent and according to conditions of Research Ethics Committee approval).
Compliance with legal and regulatory requirements General Data Protection Regulation 2018, Data Protection Bill 2017, Freedom of Information Act 2000, NHS Act 2006, Health and Social Care Act 2012, the Common Law Duty of Confidentiality, Human Tissue Act 2004 and applicable requirements from organisations affiliated with the Health Research Authority, including Research Ethics Committees and the Confidentiality Advisory Group (CAG).
Provision of a signed Genomics England data access agreement to the Access Review Committee.
Prioritisation of access according to resource availability.
Facilitation of high-quality health research

Commercial partnerships are crucial to achieving the aims of the NGRL and are achieved through the Discovery Forum. As with the non-commercial academic research led by GeCIP, commercial research aims to bring benefit to the patients and, through the use of the Data, inform development of platforms and tools for future diagnostic discovery. Commercial research can be broadly categorised into four themes that answer different questions along the typical Research and Discovery Biopharmaceutical Pipeline. At a high level they are divided into:
Diagnostic discovery
Pre-clinical research
Clinical Trials Referral
Real World Evidence / Market Access

Approval process for Commercial organisations for access to the NGRL:
Discovery Forum applications from a commercial organisation would be reviewed for suitability by the Partnership Development (PD)Team. The PD Team consider the credentials of the applying organisation including consideration of adverse public perception and reputational risk from approving data access for that organisation. If the PD Team feel appropriate, they are then passed on to be scrutinised by the independent Access Review Committee (ARC). ARC is constituted of Participant Panel members and senior individuals from various scientific and medical backgrounds. ARC assess the companys research proposal, including patient/participant involvement, potential future value to patients/the NHS and the ethics of the proposal.

Approval process for GeCIP users (academic) of the NGRL:
Researcher visits Genomics England website to enrol as a GECIP member
Completion of onboarding process; Verification by their institution (institution will be required to sign a Genomics participation agreement and appoint a membership secretary), verification of their self-stated qualifications and areas of research interest by the GEL Scientific Manager to join their domain of choice, take the IG and GECIP rules training course and pass test with at least 80%. They are then able to access the NGRL and the Research Portal (the area where prospective GeCIP applicants can apply and register their research project)
Within 3 months of gaining access they need to either submit a research proposal for Genomics England approval, which currently has to fit with the Detailed Research Plan for their domain, or join another registered project. Otherwise they will lose access.
On an annual basis, complete a survey sent out by Genomics England giving details of their research progress and any outputs, to aid reporting to ARC.
Any data they wish to either import or export to/from the NGRL has to be approved by Airlock (Airlock policy is described below) as not being personally identifiable.
If a researcher has not accessed the NGRL, the Research Portal, or logged into their GEL account to gain access to either of the previous for 6 months their account will be deactivated.

All research activities undertaken in the NGRL aim to enrich the existing dataset via one or multiple routes:
Identification of diagnoses originally missed by the standardised pipeline
Feedback of new diagnoses to patients
Mobilising samples which can help to identify diagnoses that were missed through analyses of WGS alone

Researchers can access pseudonymised Data through NGRL under sub licence. The only Data allowed to be exported are summary results. An airlock policy has been established which enables material (data, files, tools etc) to be moved in or out of the NGRL in a controlled and supervised manner; facilitating research and discovery, while maintaining control of security and access.

Data accessed under sub licence is only granted to named individuals identified to Genomics England who agree to comply with the Airlock policy, Information Governance and IT Security Policy. Before being provided with credentials necessary to access the NGRL a Company Researcher must complete information governance training which shall be provided by Genomics England.

AIRLOCK POLICY:

The following rules are applied to all airlock requests:
1. All relevant details of the summary results to be transferred must be provided with every request.
2. All summary results transferred must be checked by Genomics England to ensure compliance with the relevant policies. Users will be notified of any summary results rejected along with the reason for the rejection.
3. All imports will be checked for viruses and malware and those failing this test will be rejected. It is the responsibility of the requestors to resolve such issues before re-submitting the file for transfer.
4. Summary results requested for transfer are assessed using the following criteria:
a. whether the request aligns with the users ARC approval in full;
b. whether the request can clearly be demonstrated to be aligned with a registered project in the NGRL;
c. any data security implications;
d. any disclosure risks;
e. the technical feasibility and associated cost of the request;
f. when importing data, its scientific value to the community of researchers within the NGRL, and when and how it will be shared;
g. when importing data, checks will be performed to ensure that the data importer owns the data and holds the correct consents and approvals.

The Airlock Manager has formal delegated approval to approve requests where there is precedent from previous Airlock Review Committees. For more complicated requests or where no precedent has been set these will go to the airlock committee for review and a decision. The airlock committee is a delegation of the Genomics England Chief Scientist who responsible for oversight of all airlock requests in accordance with the airlock policy. The committee comprises of:
Technical Lead
User Community Representative
Bioinformatics Director
Caldicott Guardian
Chief Scientist representative

The Data will be processed worldwide.

Access is restricted to substantive employees of Genomics England, Genomics Clinical Interpretation Partners (GeCIP) members, and members of the Discovery Forum, who have authorisation from the Principal Investigator.

GeCIP membership is open to any individual, student or member of staff, who is affiliated with a host institution which include the following:
UK academic research institutions (e.g., universities, research institutions etc.)
NHS trusts or authorities
UK and foreign charitable organisations directly related to the focus of the 100,000 Genomes Project
Foreign universities and research institutions that carry out significant research activity
UK and foreign governmental departments that carry out significant research activity (e.g., Medical Research Council (MRC), National Institute of Health (NIH), Public Health England (PHE))
Foreign healthcare organisations (private or public) that undertake significant research activity

To be eligible for data access as a GeCIP member, applicants must meet these requirements:
Their host institution has signed a GeCIP Participation Agreement, which outlines the key principles that members of each institution must adhere to, including the Intellectual Property and Publication Policy.
Their host institution has verified that they are affiliated with that institution.
The applicants GeCIP domain has submitted a detailed research plan and it has been approved by the Genomics England Access Review Committee (see below).
The GeCIP domain lead has approved the application.
Following approval, GeCIP researchers must sign a specific agreement (GeCIP rules) covering their behaviour and working practice within the data infrastructure.
Data access will not then be granted until a researcher has successfully passed mandatory information governance training.

All applications have to provide health and social care benefits in England and are reviewed by a panel (the Access Review Committee (ARC)).

The ARC provides an independent examination of requests for data access. The ARC comprises external scientific experts, patient representatives and members of Genomics Englands Participant Panel.

GeCIP users will be granted access to all data and knowledge held within the NGRL. Each GeCIP domain will have access to its own private shared area of the NGRL for data storage and collaboration. The secure virtual desktop infrastructure will provide the workspace for clinical teams, research groups and trainees to undertake their work.

All personnel accessing the Data have been appropriately trained in data protection and confidentiality.

The Data will be linked at person record level with the patients genetic data within the NGRL. This includes the following data:
> National Cancer Registration and Analysis Service (NCRAS) and uncurated NCRAS data
> Secure Anonymised Information Linkage (SAIL) data; Welsh data
> Patient samples (e.g., blood, saliva, tissue, RNA, plasma and serum)
> NHS Trust's data

The Data will not be linked with any other data.

The identifying details will be stored in a separate database to the linked dataset used for analysis. All analyses will use the pseudonymised Dataset. There will be no requirement and no attempt to reidentify individuals when using the pseudonymised Dataset.

To protect patient confidentiality, access to the NGRL will be granted only for specific, approved purposes in accordance with informed consent. Any attempted use beyond the specified purpose may lead to exclusion and possible legal action, where appropriate.

Data accessed under sub licence will not be re-identified.

Data shared through the Airlock process is aggregate data only.

A release register detailing any sub licences and onward sharing can be found here: https://research.genomicsengland.co.uk/research-registry/browse

Genomics England will take responsibility for the actions and omissions of all sub licences and breach of a sub licence will automatically be regarded as breach of the Data Sharing Framework Contract.

In the event of termination or expiry of the Data Sharing Framework Contract between NHS England and Genomics England, Data from NHS England will be removed from the NGRL, preventing access to the Data for all users.

Expected Benefits:

Gene discovery in the NGRL will create significant opportunities for scientific innovation through routine service, the focus on residual unmet need, and emphasis upon national and international collaborations. The library is expected to enable genomically-driven reclassification of rare diseases leading to opportunities to recall patients for deeper phenotyping through Rare Diseases Translational Research Collaboration (RD-TRC). RD-TRC has been setup by the National Institute for Health and Care Research and its aim is to provide research infrastructure that harnesses the strength of the NHS to support discoveries and translational research on rare diseases. These data are expected to pave the way for functional characterisation of findings, thereby adding further value to datasets, improving diagnostic utility and possibly identifying new targets and therapies.

The use of the Data could help to achieve the following benefits:

> Through the international coalition of research intellects known as the Genomics England Clinical Interpretation Partnership (GeCIP) and the Discovery Forum, the framework for Genomics England to work with Industry:
Create a mechanism for research to continually improve the accuracy and reliability of information fed back to patients
Add to knowledge of the genetic basis of disease
Increase opportunities for clinical trials
Build the evidence base to accelerate the introduction of new technologies into healthcare
> Stimulate and enhance UK industry and investment
> Provide access to this unique research data resource to industry for the purpose of developing new knowledge, methods of analysis, medicines, diagnostics and devices
> Attract inward investment from life science companies, with an aim of increasing opportunities of access to medicines that would otherwise be unavailable to UK patients
> Result in new scientific insights and discoveries
> Information linked to continually updated with long-term patient health and personal information to aid analysis by researchers.
> Increase public knowledge and support for genomic medicine by delivering an ethical and transparent programme, retaining patient and public trust and confidence. This is aided by work with a range of partners to increase knowledge of genomics.

Specific Example: Generations Study:

Within the Generations Study, Genomics England intend to use NHS England Data to provide evidence to answer the questions set out above for evaluation purposes. Specifically, HES and CSDS will be used for the following:
Cost effectiveness
To approximate the true costs associated with additional / fewer =healthcare encounters of WGS in newborns, to include A&E attendances, outpatient appointments, admissions, allied health professional appointments, procedures, medication and treatment for all participants that screen positive through the Generation Study.
Health related outcomes
To estimate the impact of WGS in newborns on the following, for all participants that screen positive through the Generation Study:
o Diagnostic Odyssey to include i) time from first clinical contact to diagnosis, ii) age at diagnosis, iii) frequency and duration of health encounters during the diagnostic period.
o Health encounters such as A&E attendances, outpatient appointments, admissions, allied health professional appointments over a defined time period.
o Interventions for example procedures, medication and treatment over a defined time period.
o Mortality to include i) age at death, and ii) cause of death.
Demographics monitoring
To ensure enrolled participants are representative of the wider English population based on a variety of demographic variables.
Impacts on the NHS
To identify if there has been any impact on the uptake of existing NHS newborn screening amongst participants of the Generation Study

Genomics England also hope to link NHS England Data to genomic data for discovery research purposes. HES, CSDS and MSDS are expected to prove useful for researchers seeking opportunities to improve their understanding of rare disease, to develop new treatments, and diagnoses, and better understand how genes affect health.

Outputs:

Researchers will have their own dissemination and communication strategies, however a full list of scientific publications and conferences/posters will be made available on the Genomics England website on an ongoing basis. The expected outputs of the processing will be:
> Submissions to peer reviewed journals
> Presentations at conferences
> Posters
> Creation of a database of all genomic data, including all genomic and omics tests.

The outputs will not contain NHS England Data and will only contain aggregated information with small numbers suppressed as appropriate in line with the relevant disclosure rules for the Dataset(s) from which the information was derived.

The outputs will be communicated to relevant recipients through the following dissemination channels:
> Journals
> Posters
> Website: A list of publications is kept up-to-date on the Genomics website: https://www.genomicsengland.co.uk/research/publications?
> Presentations at appropriate conferences
> Upload of findings onto the Discovery Forum: Genomics England works with industry partners through the Discovery Forum. All members of the Forum are obliged to publish all findings and research at the point at which intellectual property for any product is protected. Additionally, it allows the NGRL users to report back to Genomics England on what aspects of the data are proving to be most useful to their research studies, what data is missing and how the data should be collected and developed. These partners act as a critical friend and have already made many helpful suggestions to increase the likelihood of successful research in the future for all those using Genomics England's NGRL.

Processing:

Genomics England will transfer data to NHS England. The data will consist of identifying details (specifically study ID, NHS Number, Date of Birth, Surname, Forename, Gender, Postcode and Other Given Name) which are required for the cohort to be linked with NHS England Data. This is the minimum requirement of identifiers required for linkage to guarantee complete matching.

NHS England Data will provide the relevant records from the HES APC, HES CC, HES OP, ECDS, Civil Registrations of Death, DIDs, CSDS and MSDS Datasets to Genomics Englands Amazon Web Services (AWS) cloud storage. The Data will:
> Contain directly identifying Data items including but not limited to: Names, Postcode, Cause of Deaths, Place of Birth, Cancer Registration Number, which are required to provide maximum insight, and therefore maximum value to the researchers accessing the Data.

The NHS England Data is pseudonymised within the AWS cloud and is then loaded into the NGRL. Raw, identifiable files are kept in a secure location on AWS.

The Data will not be transferred to any other location.

The Data will be stored on the NGRL and the AWS Cloud at Genomics England.

Genomics England stores NGRL data on the Cloud provided by Amazon Web Services (AWS).

The Data will be accessed by authorised personnel via remote access.
The Controller(s) must confirm and provide evidence upon audit by NHS England that access via any remote device complies with the data security obligations within this DSA and the Data Sharing Framework Contract.
For remote access:
- Remote access will only be from secure locations situated within the territory of use (as further restricted elsewhere within the DSA if so done) stated within this DSA;
- Access controls granting users the minimum level of access required are in place;
- Remote access is only via secure connections (e.g., VPNs or secure protocols) to protect data;
- Multifactor authentication (MFA) is required for remote access;
- Device security, including up-to-date software and operating systems, antivirus software, and enabled firewalls are utilised for remote access;
- All remote access is undertaken within the scope of the organisations DSPT (or other security arrangements as per this DSA) and complies with the organisations remote access policy.

The above applies in addition to any condition set out elsewhere within the DSA (e.g. who may carry out processing, and for what purpose).

Data is physically stored in England.

Remote access is permitted from the following specified countries; UK, EEA Countries, United States, Canada, Australia, Qatar, Republic of Korea, Japan, Switzerland, Brazil, India, New Zealand, Argentina

Should any country on the permitted list above become a high risk country through the duration of this agreement, the Recipient will cease disseminating data to researchers/organisations based in that country and request that data already disseminated be destroyed.

Should the Recipient wish to share data with any countries not listed above, it will require an update to this agreement

Should Genomics England wish to facilitate remote access from a country that is not listed above, prior written agreement from NHS England must be obtained.

Genomics England upholds the following safeguards and controls:
1. Compliance with National Cyber Security Centre (NCSC) guidance, leading to the implementation of geo-blocking measures for IP addresses originating from Iran, Russia, North Korea and Belarus
2. Collaboration with the NCSC and other security partners to identify and block potentially risky IP addresses, irrespective of their country of origin.
3. Implementation of two email authentication methods, namely Domain-based Message Authentication Reporting and Conformance (DMARC) and Sender Policy Framework (SPF), to detect and respond to spoofing and spam. This is crucial, as these activities often target our firewalls from international IP addresses.
4. Introduction of additional assurance activities related to international access within the Office 365 estate.
5. Conducting due diligence on companies associated with BGI Genomics.
6. Responsibilities of the Access Review Committee (ARC) include the thorough review of applications and applicants.
7. Continuous improvement of Information Governance training and cybersecurity awareness at Genomics England.

Access to confidential patient identifiable Data is restricted to an extremely limited number of employees of Genomics England, accessible on AWS.

Substantive employees of Genomics England and researchers who are a member of the GeCIP and Discovery Forum will process the Data for the purposes described above.

Curated NCRAS data for the The National Genomics Research Library (ODR1617_131) — DARS-NIC-656890-V4L0D

Opt outs honoured: (Excuses: Consent (Reasonable Expectation))

Legal basis: Health and Social Care Act 2012 s261(2)(c)

Purposes: Yes (Research)

Sensitive: Non-Sensitive

When:DSA runs 2024-04 – 2027-04

Access method: One-Off

Data-controller type: GENOMICS ENGLAND

Sublicensing allowed: Yes

AGD/predecessor discussions: AGD minutes - 25 April 2024 final.pdf

Datasets:

NDRS Cancer Registrations
NDRS Linked Cancer Waiting Times (Treatments only)
NDRS Linked DIDs
NDRS National Cancer Patient Experience Survey (CPES)
NDRS National Radiotherapy Dataset (RTDS)
NDRS Systemic Anti-Cancer Therapy Dataset (SACT)

Type of data: Anonymised - ICO Code Compliant

Objectives:

Genomics England requires access to NHS England data for use in their National Genomics Research Library (NGRL), which operates as a Trusted Research Environment (TRE).

National Genomic Research Library (NGRL)
The NGRL is a secure national resource of genomic, health and sample data managed by Genomics England, which builds on the research environment created by Genomics England for the 100,000 Genomes Project that completed recruitment in 2018 and involved the sequencing of approximately 100,000 genomes. It contains cohorts of patients/participants recruited via programmes set up for the NHS Genomic Medicine Service (GMS), the 100,000 Genomes Project, and other studies/programmes where patients/participants genomes have been sequenced, and provides a national standardised genomic research resource. Being able to compare all patient data in one place provides researchers with an opportunity to better understand diseases, develop new treatments and can lead to new discoveries.

The NGRL contains NHS England Data linked at record-level with data on the 100,000 Genomes Project cohort. Genomics England will also store NHS England Data linked at record-level with data on other Genomics England programmes.

This Data Sharing Agreement (DSA) covers Data provided for the 100,000 Genomes Projects cohort and the NHS Genomic Medicine Service (NHS GMS) cohorts.

NHS Genomic Medicine Service (NHS GMS)
Following the successfully delivery of the 100,000 Genomes Project, Genomics England begun work to deliver the NHS GMS. This service will offer patients the dual opportunity of routine clinical care alongside a choice to participate in research spanning across all genomic tests within the NHS, starting with whole genomes. Development of the NHS GMS builds on the evidence generated by the 100,000 Genomes Project, but also extends to other genomic testing other than Whole Genome Sequencing (WGS).

The NHS GMS is led and commissioned by NHS England and will consist of:
> NHS Genomic Laboratory Hubs (GLHs) that will work as part of a National Genomic testing service. The provisions in this service will be determined by a national genomic test directory that outlines the testing strategies and technology to be employed for rare and inherited disease, cancer and other defined conditions/ applications.
> Clinical Genetic Services
> Cancer services using genomic analysis to guide treatment

Within the GMS, two particular cohorts are defined:

(a) The Rare Diseases cohort
The NGRL rare diseases cohort will consist of families with rare diseases, based on family structures appropriate to the provision of the GMS. This will harness the strength of current UK rare disease programmes and will advance current understanding of rare disease mechanisms. It may also impact on common diseases that share similar phenotypes. It will also offer opportunities for biomarker, clinical, and interventional studies through industrial partnerships. Genomics England will actively support the implementation of the UK Rare Disease Strategy. Building on work already undertaken by the 100,000 Genomes Project, this will facilitate the generation of a national data resource of all genomic data, with a focus on WGS but including all genomic testing.

The goals of processing data of patients with rare disease are:
To increase discovery of pathogenic variants (the gene variant responsible for causing disease) for rare disease.
To add value with additional biological insights that build confidence in commonly accepted pathogenic variants.
To enhance the clinical interpretation of WGS in rare disease.
To develop a programme of functional pathways for genomic tests other than whole genome sequencing, specifically, transcriptomics (the study of all the ribonucleic acid (RNA) molecules within a cell, otherwise known as the transcriptome), epigenetics (the study of how cells control gene activity without changing the DNA sequence), micro RNAs and biomarkers.
To return findings to the NHS for feedback to patients.
To create a unique dataset for rare diseases that may enable therapeutic innovation.

(b) The Cancer cohort
The NGRL will continue to learn from and collaborate with other projects who are producing an inventory of genomic, transcriptomic and epigenomic changes in a wide range of different tumour types. Researchers from these projects access the data via an approved sublicensing agreement with Genomics England.

The goals of processing data for patients with cancer are to:
Use WGS to identify novel driver mutations for cancer and to understand its evolutionary genetic architecture through primary and secondary malignant disease (by multiple biopsy and WGS).
Partner stratified healthcare programmes and outcome studies with patients from the NHS in England, to enable understanding of WGS benefits in defining predictors of therapeutic response to cancer therapies.
To use other genomic testing approaches to offer additional biological insights into cancer.
To utilise WGS to identify new pathways for cancer therapies and improved diagnostic characterisation.

Other forms of collaboration include direct partnerships to develop tools to improve systems and services. For example, partnering with Lifebit to take advantage of their technical genomic data tooling.

The following NHS England Data will be accessed:
> NDRS National Radiotherapy Dataset (RTDS)
> NDRS Linked DIDs
> NDRS Systemic Anti-Cancer Therapy Dataset (SACT)
> NDRS Cancer Registrations
> NDRS Linked Cancer Waiting Times (Treatments only)
> NDRS National Cancer Patient Experience Survey (CPES)

The data will be minimised as follows:
> Limited to cohorts who consented to participate in (1) the 100,000 genomes project who also consented to longitudinal research (~80,000 participants) or (2) the Genomics Medicines Service (expected ~100,000 new additions per year).

Genomics England will request full history of patient Data to provide maximum insight, and therefore maximum value to the researchers accessing the Data. Because of the wide scope of the proposal, there are no other alternative or less intrusive ways of achieving the purpose described.

Genomics England is the controller as the organisation responsible for ensuring that the Data will only be processed for the purpose described above.

NHS England has commissioned Genomics England to undertake the work. NHS England does not specify what data are required to deliver the work nor how the data shall be processed to achieve that purpose. Such decisions are taken by Genomics England.

The lawful basis for processing personal data under the UK GDPR is:
> Article 6(1)(f) - processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party.

Genomics England has determined the processing is necessary for its legitimate interests in carrying out medical research on the causes, diagnosis and treatment of cancers.

The lawful basis for processing special category data under the UK GDPR is:
> Article 9(2)(j) - processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject.

It is necessary for Genomics England to process special category participant data for carrying out medical research on the causes, diagnosis and treatment of cancers, which is expected to benefit patients.

The funding is provided by the Department of Health and Social Care. The funding is specifically for the projects described. Funding is in place until March 2025, with the intention to renew this funding periodically.

The funder will have no ability to suppress or otherwise limit the publication of findings.

Lifebit Biotech Limited provides IT support to Genomics England.

Amazon Web Services (AWS) provides IT back up services to Genomics England and will store copies of the data as contracted by Genomics England.

Representatives from patient and public bodies have an important role to play in Genomics England commercial initiatives. These representatives ensure transparency is upheld, and the interest of those whose data is being used is always being respected.

In the early stages of the Library, Genomics England undertook a range of work to ensure that potential participants views were included in the formulation of the ethical policies submitted for research ethics approval and in the development of patient information. The views of different groups of potential participants (those affected by cancer, rare disease, and those from BAME communities) in relation to ethical issues raised by the 100,000 Genomes Project were sought and findings were published on the Genomics England website (See all reports under patient and public involvement - https://www.genomicsengland.co.uk/library-and-resources/ and the Genomics England Engagement Strategy). Genomics England will continue to engage with these stakeholders. Further to this, each of the 13 currently recruiting NHS Genomic Medicine Centres had dedicated Patient and Public Involvement leads (PPI) who are responsible for engaging with and involving local potential participant groups from diverse backgrounds. It is expected that the future NHS GMS will continue these local PPI activities to shape and inform the service.

A Participant Panel has also been established. This 30-strong group has provided invaluable advice on a range of topics, for instance, in shaping how analysis is monitored, how results are returned, and how advice and support should be framed. Participant Panel members have either donated samples to the Library themselves or are carers of participants. They take part in a wide variety of consultative groups, such as the Genomics England Ethics Advisory Committee but most importantly are guardians of the dataset, with representatives on the Access Review Committee. Participants play an important part in every decision made about access to data.

SUB-LICENCING:

Genomics Clinical Interpretation Partners (GeCIP) members (Academic research organisations), and members of the Discovery Forum (Commercial organisations) will also have access to the pseudonymised Data within the NGRL, subject to internal approval by Genomics England. NHS England Data is combined with the genomic and sample data within the NGRL, providing a more comprehensive medical history, and going forward, a more comprehensive patient journey which will be a valuable resource for medical research. All applications have to provide health and social care benefits and are reviewed by a panel (the Access Review Committee (ARC)) before access is granted.

It is anticipated that the volume of sub-licences will be 150-200 per year. The GeCIP sub licence agreement is indefinite, until it is terminated by either the GeCIP member or Genomics.
The Data Access Agreement for Discovery Forum Members has a specified term, normally 12 months, at which point the company and Genomics can choose to renew or not.

All requests for data access will be subject to the following considerations:
Protection of data subjects (honouring commitments made to them, acting within the scope of consent and according to conditions of Research Ethics Committee approval).
Compliance with legal and regulatory requirements General Data Protection Regulation 2018, Data Protection Bill 2017, Freedom of Information Act 2000, NHS Act 2006, Health and Social Care Act 2012, the Common Law Duty of Confidentiality, Human Tissue Act 2004 and applicable requirements from organisations affiliated with the Health Research Authority, including Research Ethics Committees and the Confidentiality Advisory Group (CAG).
Provision of a signed Genomics England data access agreement to the Access Review Committee.
Prioritisation of access according to resource availability.
Facilitation of high-quality health research

Commercial partnerships are crucial to achieving the aims of the NGRL and are achieved through the Discovery Forum. As with the non-commercial academic research led by GeCIP, commercial research aims to bring benefit to the patients and, through the use of the Data, inform development of platforms and tools for future diagnostic discovery. Commercial research can be broadly categorised into four themes that answer different questions along the typical Research and Discovery Biopharmaceutical Pipeline. At a high level they are divided into:
Diagnostic discovery
Pre-clinical research
Clinical Trials Referral
Real World Evidence / Market Access

Approval process for Commercial organisations for access to the NGRL:

Discovery Forum applications from a commercial organisation would be reviewed for suitability by the Partnership Development (PD)Team. The PD Team consider the credentials of the applying organisation including consideration of adverse public perception and reputational risk from approving data access for that organisation. If the PD Team feel appropriate, they are then passed on to be scrutinised by the independent Access Review Committee (ARC). ARC is constituted of Participant Panel members and senior individuals from various scientific and medical backgrounds. ARC assess the companys research proposal, including patient/participant involvement, potential future value to patients/the NHS and the ethics of the proposal.

The ARC will assess whether there has been any Patient and Public Involvement and Engagement (PPIE) informing the research questions and design. For many commercial applications that are exploring early-stage research and development (R&D), for example, target identification and validation, there will not have been any PPIE because the research may be tied to exploring fundamental biological mechanisms and pathways rather than particular conditions or phenotypes. If there has been PPIE, the ARC will determine whether it has adequately informed the research questions and design, and whether there is a commitment to ongoing PPIE and transparency following the outcomes of the research. Although PPIE is not a requirement of applications, ARC encourage applicants to consider at what stage in their R&D process it would be appropriate to consult with patient advocacy and participation groups.

Genomics England will only work with companies that are aligned with its strategy and mission to bring the benefits of genomic medicine to everyone. The Partnerships Development team will assess whether a company seeking access to NGRL data is working in the cancer or rare disease diagnostics and therapeutics space, or supporting UK Government strategic scientific initiatives if not, Genomics England would not permit an application to ARC in the first place. All applications must conform to the acceptable uses set out in the REC-approved NGRL protocol. If the research proposal is for later stage research that has a clear pathway to intended patient or health system benefit, the ARC would expect to see this articulated as part of the rationale for seeking access to NGRL data. Given the early stage of much commercial genomics research, not all accepted applications will be able to demonstrate a clear explanation of the expected healthcare benefits.

Approval process for GeCIP users (academic) of the NGRL:

Researcher visits Genomics England website to enrol as a GECIP member
Completion of onboarding process; Verification by their institution (institution will be required to sign a Genomics participation agreement and appoint a membership secretary), verification of their self-stated qualifications and areas of research interest by the GEL Scientific Manager to join their domain of choice, take the IG and GECIP rules training course and pass test with at least 80%. They are then able to access the NGRL and the Research Portal (the area where prospective GeCIP applicants can apply and register their research project)
Within 3 months of gaining access they need to either submit a research proposal for Genomics England approval, which currently has to fit with the Detailed Research Plan for their domain, or join another registered project. Otherwise they will lose access.
On an annual basis, complete a survey sent out by Genomics England giving details of their research progress and any outputs, to aid reporting to ARC.
Any data they wish to either import or export to/from the NGRL has to be approved by Airlock (Airlock policy is described below) as not being personally identifiable.
If a researcher has not accessed the NGRL, the Research Portal, or logged into their GEL account to gain access to either of the previous for 6 months their account will be deactivated.

All research activities undertaken in the NGRL aim to enrich the existing dataset via one or multiple routes:
Identification of diagnoses originally missed by the standardised pipeline
Feedback of new diagnoses to patients
Mobilising samples which can help to identify diagnoses that were missed through analyses of WGS alone

Researchers can access pseudonymised Data through NGRL under sub licence. The only Data allowed to be exported are summary results. An airlock policy has been established which enables material (data, files, tools etc) to be moved in or out of the NGRL in a controlled and supervised manner; facilitating research and discovery, while maintaining control of security and access.

Data accessed under sub licence is only granted to named individuals identified to Genomics England who agree to comply with the Airlock policy, Information Governance and IT Security Policy. Before being provided with credentials necessary to access the NGRL a Company Researcher must complete information governance training which shall be provided by Genomics England.

AIRLOCK POLICY:

The following rules are applied to all airlock requests:
1. All relevant details of the summary results to be transferred must be provided with every request.
2. All summary results transferred must be checked by Genomics England to ensure compliance with the relevant policies. Users will be notified of any summary results rejected along with the reason for the rejection.
3. All imports will be checked for viruses and malware and those failing this test will be rejected. It is the responsibility of the requestors to resolve such issues before re-submitting the file for transfer.
4. Summary results requested for transfer are assessed using the following criteria:
a. whether the request aligns with the users ARC approval in full;
b. whether the request can clearly be demonstrated to be aligned with a registered project in the NGRL;
c. any data security implications;
d. any disclosure risks;
e. the technical feasibility and associated cost of the request;
f. when importing data, its scientific value to the community of researchers within the NGRL, and when and how it will be shared;
g. when importing data, checks will be performed to ensure that the data importer owns the data and holds the correct consents and approvals.

The Airlock Manager has formal delegated approval to approve requests where there is precedent from previous Airlock Review Committees. For more complicated requests or where no precedent has been set these will go to the airlock committee for review and a decision. The airlock committee is a delegation of the Genomics England Chief Scientist who responsible for oversight of all airlock requests in accordance with the airlock policy. The committee comprises of:
Technical Lead
User Community Representative
Bioinformatics Director
Caldicott Guardian
Chief Scientist representative

The data will be processed worldwide.

Access is restricted to substantive employees of Genomics England, Genomics Clinical Interpretation Partners (GeCIP) members, and members of the Discovery Forum, who have authorisation from the Principal Investigator.

GeCIP membership is open to any individual, student or member of staff, who is affiliated with a host institution which include the following:
UK academic research institutions (e.g., universities, research institutions etc.)
NHS trusts or authorities
UK and foreign charitable organisations directly related to the focus of the 100,000 Genomes Project
Foreign universities and research institutions that carry out significant research activity
UK and foreign governmental departments that carry out significant research activity (e.g., MRC, NIH, PHE)
Foreign healthcare organisations (private or public) that undertake significant research activity

To be eligible for data access as a GeCIP member, applicants must meet these requirements:
Their host institution has signed a GeCIP Participation Agreement, which outlines the key principles that members of each institution must adhere to, including the Intellectual Property and Publication Policy.
Their host institution has verified that they are affiliated with that institution.
The applicants GeCIP domain has submitted a detailed research plan and it has been approved by the Genomics England Access Review Committee (see below).
The GeCIP domain lead has approved the application.
Following approval, GeCIP researchers must sign a specific agreement (GeCIP rules) covering their behaviour and working practice within the data infrastructure.
Data access will not then be granted until a researcher has successfully passed mandatory information governance training.

All applications have to provide health and social care benefits and are reviewed by a panel (the Access Review Committee (ARC)).

The ARC provides an independent examination of requests for data access. The ARC comprises external scientific experts, patient representatives and members of Genomics Englands Participant Panel.

GeCIP users will be granted access to all data and knowledge held within the NGRL. Each GeCIP domain will have access to its own private shared area of the NGRL for data storage and collaboration. The secure virtual desktop infrastructure will provide the workspace for clinical teams, research groups and trainees to undertake their work.

All personnel accessing the data have been appropriately trained in data protection and confidentiality.

The data will be linked at person record level with the patients genetic data within the NGRL. This includes the following data:
> NHS England hospital, deaths, cancer registration, mental health and diagnostic imaging data obtained from the DARS-NIC-12784-R8W7V Agreement
> Secure Anonymised Information Linkage (SAIL) data; Welsh data
> Patient samples (e.g., blood, saliva, tissue, RNA, plasma and serum)

The Data will not be linked with any other data.

The identifying details will be stored in a separate database to the linked dataset used for analysis. All analyses will use the pseudonymised dataset. There will be no requirement and no attempt to reidentify individuals when using the pseudonymised dataset.

To protect patient confidentiality, access to the NGRL will be granted only for specific, approved purposes in accordance with informed consent. Any attempted use beyond the specified purpose may lead to exclusion and possible legal action, where appropriate.

Data accessed under sub licence will not be re-identified.

Genomics England rely on GDPR Article 6 (1)(f) for the personal data and Article 9(2)(j) for the special category data shared within the NGRL.

Data shared through the Airlock process is aggregate data only and is therefore not personal data so does not require a legal basis under the UK GDPR.

A release register detailing any sub licences and onward sharing can be found here: https://research.genomicsengland.co.uk/research-registry/browse

Genomics will take responsibility for the actions and omissions of all sub licences and breach of a sub licence will automatically be regarded as breach of the Data Sharing Framework Contract.

In the event of termination or expiry of the Data Sharing Framework Contract between NHS England and the applicant, data from NHS England will be removed from the NGRL, preventing access to the data for all users.

NHS England will require the ability to audit the sub licensee.

Yielded Benefits:

New scientific insights and discovery: with the consent of patients, creating a database of 100,000 whole genome sequences linked to continually updated long term patient health and personal information for analysis by researchers. This has enhanced genomic healthcare research by creating the largest genomic healthcare data resource in the world, which in turn will uncover answers for participants both now and in the future through genomic-level analysis of conditions Accelerating the uptake of genomic medicine in the NHS: working with NHSE and other partners to deliver a scale-able WGS and informatics platform to enable these services to be made widely available for NHS patients. In addition, through the Genomics England Clinical Interpretation Partnership (GeCIP), creating a mechanism to both continually improve the accuracy and reliability of information fed back to patients and add to knowledge of the genetic basis of disease. This acceleration significantly contributed to delivering the Genomics Medicine Service (GMS) for the NHS, which makes whole genome sequencing part of routine healthcare. Stimulating and enhancing UK industry and investment: by providing access to this unique data resource by industry for the purpose of developing new knowledge, methods of analysis, medicines, diagnostics and devices. The creation of the Discovery Forum provides a platform for collaboration and engagement between Genomics England, industry partners, academia, the NHS and the wider UK genomics landscape. Increasing public knowledge and support for genomic medicine: delivering an ethical and transparent programme which has public trust and confidence and working with a range of partners to increase knowledge of genomics. After involving participants in all stages of the pioneering 100,000 Genomes Project and putting a trusted system in place contributed to a major dialogue led by Ipsos MORI and commissioned by Genomics England and co-funded by UK Research and Innovations Sciencewise programme in 2019 (post 100,000 genomes project completion) found the public are enthusiastic and optimistic about the potential for genomic medicine.

Expected Benefits:

Gene discovery in the NGRL will create significant opportunities for scientific innovation through routine service, the focus on residual unmet need, and emphasis upon national and international collaborations.

The use of the data could help to achieve the following benefits:

> Through the international coalition of research intellects known as the Genomics England Clinical Interpretation Partnership (GeCIP) and the Discovery Forum, the framework for Genomics England to work with Industry:
Create a mechanism for research to continually improve the accuracy and reliability of information fed back to patients
Add to knowledge of the genetic basis of disease
Increase opportunities for clinical trials
Build the evidence base to accelerate the introduction of new technologies into healthcare
> Stimulate and enhance UK industry and investment
> Provide access to this unique research data resource to industry for the purpose of developing new knowledge, methods of analysis, medicines, diagnostics and devices
> Attract inward investment from life science companies, with an aim of increasing opportunities of access to medicines that would otherwise be unavailable to UK patients
> Result in new scientific insights and discoveries
> Information linked to continually updated with long-term patient health and personal information to aid analysis by researchers.
> Increase public knowledge and support for genomic medicine by delivering an ethical and transparent programme, retaining patient and public trust and confidence. This is aided by work with a range of partners to increase knowledge of genomics.

> Use WGS to identify novel driver mutations for cancer and to understand its evolutionary genetic architecture through primary and secondary malignant disease
> Partner stratified healthcare programmes and outcome studies with patients from the NHS in England, to enable understanding of WGS benefits in defining predictors of therapeutic response to cancer therapies
> To use approaches using other genomic tests to offer additional biological insights into cancer
> To utilise WGS to identify new pathways for cancer therapies and improved diagnostic characterisation.

The expected patient benefit is to provide clinical diagnosis, and in time, new or more effective treatments for NHS patients. The discovery of new causes of disease, the offer of tailored therapies to create the best outcomes, and the priming new or more effective treatments for NHS patients, are other expected patient benefits.

Outputs:

.Researchers will have their own dissemination and communication strategies, however a full list of scientific publications and conferences/posters will be made available on the Genomics England website on an ongoing basis. The expected outputs of the processing will be:
> Submissions to peer reviewed journals
> Presentations at conferences
> Posters
> Creation of a database of all genomic data, including all genomic and omics tests

The outputs will not contain NHS England data and will only contain aggregated information with small numbers suppressed as appropriate in line with the relevant disclosure rules for the dataset(s) from which the information was derived.

The outputs will be communicated to relevant recipients through the following dissemination channels:
> Journals
> Posters
> Website: A list of publications is kept up-to-date on the Genomics website: https://www.genomicsengland.co.uk/research/publications?
> Presentations at appropriate conferences
> Upload of findings onto the Discovery Forum: Genomics England works with industry partners through the Discovery Forum. All members of the Forum are obliged to publish all findings and research at the point at which intellectual property for any product is protected. Additionally, it allows the NGRL users to report back to Genomics England on what aspects of the data are proving to be most useful to their research studies, what data is missing and how the data should be collected and developed. These partners act as a critical friend and have already made many helpful suggestions to increase the likelihood of successful research in the future for all those using Genomics England's NGRL.

Processing:

Genomics England will transfer data to NHS England. The data will consist of identifying details (specifically study ID, NHS Number, Date of Birth, Surname, Forename, Gender, Postcode and Other Given Name) which are required for the cohort to be linked with NHS England data. This is the minimum requirement of identifiers required for linkage to guarantee complete matching. Datasets will be transferred from NHSE to Genomics Englands Amazon Web Services (AWS) cloud storage. The data will:
> Contain directly identifying data items including but not limited to: Names, Postcode, Cause of Deaths, Place of Birth, Cancer Registration Number, which are required to provide maximum insight, and therefore maximum value to the researchers accessing the data.

The NHS England data is pseudonymised within the AWS cloud and is then loaded into the NGRL. Raw, identifiable files are kept in a secure location on AWS.

The data will not be transferred to any other location.

The data will be stored on the NGRL and the AWS Cloud at Genomics England.

Genomics England stores NGRL data on the Cloud provided by Amazon Web Services (AWS).

The Data will be accessed by authorised personnel via remote access.

The Controller(s) must confirm and provide evidence upon audit by NHS England that access via any remote device complies with the data security obligations within this DSA and the Data Sharing Framework Contract.

For remote access:
- Remote access will only be from secure locations situated within the territory of use (as further restricted elsewhere within the DSA if so done) stated within this DSA;
- Access controls granting users the minimum level of access required are in place;
- Remote access is only via secure connections (e.g., VPNs or secure protocols) to protect data;
- Multifactor authentication (MFA) is required for remote access;
- Device security, including up-to-date software and operating systems, antivirus software, and enabled firewalls are utilised for the remote access;
- All remote access is undertaken within the scope of the organisations DSPT (or other security arrangements as per this DSA) and complies with the organisations remote access policy.

The above applies in addition to any condition set out elsewhere within the DSA (e.g. who may carry out processing, and for what purpose).

The data will be processed worldwide.

Data is physically stored in England.

Remote access is permitted from the following specified countries: UK, EEA Countries, United States, Canada, Australia, Qatar, Republic of Korea, Japan, Switzerland, Brazil, India, New Zealand, Argentina

Should any country on the permitted list above become a high risk country through the duration of this DSA, the Recipient will cease disseminating data to researchers/organisations based in that country and request that data already disseminated be destroyed.

Should the Recipient wish to share data with any countries not listed above, it will require an update to this DSA.

Should Genomics England wish to facilitate remote access from a country that is not listed above, prior written agreement from NHS England must be obtained.

Genomics England upholds the following safeguards and controls:
1. Compliance with National Cyber Security Centre (NCSC) guidance, leading to the implementation of geo-blocking measures for IP addresses originating from Iran, Russia, North Korea and Belarus
2. Collaboration with the NCSC and other security partners to identify and block potentially risky IP addresses, irrespective of their country of origin.
3. Implementation of two email authentication methods, namely Domain-based Message Authentication Reporting and Conformance (DMARC) and Sender Policy Framework (SPF), to detect and respond to spoofing and spam. This is crucial, as these activities often target our firewalls from international IP addresses.
4. Introduction of additional assurance activities related to international access within the Office 365 estate.
5. Conducting due diligence on companies associated with BGI Genomics.
6. Responsibilities of the Access Review Committee (ARC) include the thorough review of applications and applicants.
7. Continuous improvement of Information Governance training and cybersecurity awareness at Genomics England.

Access to confidential patient identifiable data is restricted to an extremely limited number of employees of Genomics England, accessible on AWS.

Substantive employees of Genomics England and researchers who are a member of the GeCIP and Discovery Forum will process the data for the purposes described above.

Genomics England (MR1418) - Amendment and Updated Request for tranche of data across multiple data sets. — DARS-NIC-12784-R8W7V

Opt outs honoured: No - consent provided by participants of research study, No - data flow is not identifiable, No (Excuses: Reasonable Expectation, Consent (Reasonable Expectation))

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC, Health and Social Care Act 2012 – s261(2)(c), Health and Social Care Act 2012 s261(2)(c), Health and Social Care Act 2012 s261(2)(c); Informed Patient consent to permit the receipt, processing and release of data by NHS Digital

Purposes: Yes (Research)

Sensitive: Sensitive, and Non Sensitive, and Non-Sensitive

When:DSA runs 2019-02 – 2020-03 2017.06 — 2025.11.

Access method: Ongoing, One-Off

Data-controller type: GENOMICS ENGLAND

Sublicensing allowed: Yes

Datasets:

Hospital Episode Statistics Admitted Patient Care
Hospital Episode Statistics Accident and Emergency
Hospital Episode Statistics Outpatients
Hospital Episode Statistics Critical Care
MRIS - Flagging Current Status Report
MRIS - Cause of Death Report
Mental Health and Learning Disabilities Data Set
Mental Health Minimum Data Set
Bridge file: Hospital Episode Statistics to Diagnostic Imaging Dataset
Diagnostic Imaging Dataset
Bridge file: Hospital Episode Statistics to Mental Health Minimum Data Set
Patient Reported Outcome Measures (Linkable to HES)
MRIS - Cohort Event Notification Report
MRIS - Members and Postings Report
MRIS - List Cleaning Report
Mental Health Services Data Set
Emergency Care Data Set (ECDS)
Demographics
Civil Registration - Deaths
Cancer Registration Data
HES-ID to MPS-ID HES Accident and Emergency
HES-ID to MPS-ID HES Admitted Patient Care
HES-ID to MPS-ID HES Outpatients
Civil Registrations of Death
Diagnostic Imaging Data Set (DID)
Hospital Episode Statistics Accident and Emergency (HES A and E)
Hospital Episode Statistics Admitted Patient Care (HES APC)
Hospital Episode Statistics Critical Care (HES Critical Care)
Hospital Episode Statistics Outpatients (HES OP)
Mental Health and Learning Disabilities Data Set (MHLDDS)
Mental Health Minimum Data Set (MHMDS)
Mental Health Services Data Set (MHSDS)

Type of data: Anonymised - ICO Code Compliant, Identifiable

Objectives:

List clean only -

Genomics England plan to issue a newsletter to participants in the 100,000 Genomes Project. This Project is recruiting families affected by rare genetic disease or cancer, who are under the care of the NHS in England.

13 NHS Genomic Medicine Centers, incorporating 80+ NHS Trusts, are recruiting participants. The aim of the Project is to provide a diagnosis for rare disease participants where there wasn’t one before. In some cases a specific treatment may be recommended. Participants also agree to share their genome and medical data for medical and scientific research.

Genomics England will issue a regular newsletter on the progress on the Project – including when participants might expect results. Genomics England are extremely keen to ensure that the newsletters are issued to participants’ latest known address and as some participants may have been recruited up to 3 years ago, it is entirely possible that the information held by Genomics England is outdated. It is also possible that recruited participants could have passed away since recruiting and Genomics England would like to ensure that newsletters are not issued to deceased participants, where possible.

************************************************************************************************
The aim is to create a new genomic medicine service for the NHS – transforming the way people are cared for. Patients may be offered a diagnosis where there wasn’t one before. In time, there is the potential of new and more effective treatments.
The project will also enable new medical research. Combining genomic sequence data with medical records is a ground-breaking resource. Researchers will study how best to use genomics in healthcare and how best to interpret the data to help patients. The causes, diagnosis and treatment of disease will also be investigated. We also aim to kick-start a UK genomics industry. This is currently the largest national sequencing project of its kind in the world.
Genomics England seeking to obtain information from participants’ medical records that span their entire lifetime. The DNA sequence, and information from patients’ health records and any other information given to the Project will be collected and stored securely by the Project as a resource for use by approved researchers for future scientific and medical purposes during the life and after the death of participants.
Diagnoses arising from the sequencing and analysis of the participants’ DNA are already being fed back to Participants now and for many they are receiving a diagnosis for the first time.
Genomic England’s legacy will be a genomics service ready for adoption by the NHS, high ethical standards and public support for genomics, new medicines, treatments and diagnostics and a country which hosts the world’s leading genomic companies. It is a bold ambition with benefits for all.

Yielded Benefits:

Over 41,000 Genomes sequenced as of December 2017. Participant stories can be found at: https://www.genomicsengland.co.uk/alexs-story/ Genomics England has built upon its commitment to lead on Governments technology and innovation agenda by forging partnership with industry. Examples of this include a new industry collaboration with leading life sciences companies Inivata and Thermo Fisher Scientific to improve understanding of cancer. Public Health England has announced that Whole Genome Sequencing (WGS) is now being used to identify different strains of tuberculosis (TB). This is the first time that WGS has been used as a diagnostic solution for managing a disease on this scale anywhere in the world. The technique, developed in conjunction with the University of Oxford, allows faster and more accurate diagnoses, meaning patients can be treated with precisely the right medication more quickly. Genomics England has now engaged devolved nations and is recruiting participants from Scotland and Wales. Update May 2018 Over 60,000 genomes have now been sequenced and over 12,000 clinical reports have been issued to NHS Genomic Medicine Centres. Thirty disease and cross-cutting research domains have had their plans approved and now have access to 100,000 Genomes Project data. The number of users with access to the Genomics England Research Environment is now over 1,300. Twelve publications have arisen from or refer to the 100,000 Genomes Project during the last year, including: • The 100,000 Genomes Project: bringing whole genome sequencing to the NHS. Clare Turnbull et al. BMJ 2018; doi: https://doi.org/10.1136/bmj.k1687 (24 April 2018) • Identification of rare sequence variation underlying heritable pulmonary arterial hypertension. Nicholas W. Morrell et al. Nature Communications 2018;9; doi:10.1038/s41467-018-03672-4 (12 April 2018) • Introducing genomics into cancer care. Sue Hill BRJ Surg 2018;105(2):e14–e15 (17 January 2018) • Missense variants in the X-linked gene PRPS1 cause retinal degeneration in females. Alessia Fiorentino, Kaoru Fujinami, Gavin Arno et al. Hum Mutat 2017; doi:10.1002/humu.23349 (17 October 2017) See https://www.genomicsengland.co.uk/category/updates/ and https://www.genomicsengland.co.uk/about-gecip/publications/ for details of news and publications. Genomics England created the Discovery Forum in July 2017 to build on the work of the GENE Consortium. The Discovery Forum provides a platform for collaboration and engagement between Genomics England, industry partners, academia, the NHS and the wider UK genomics landscape.

Expected Benefits:

The list cleaning element -

Genomics England are committed to ensuring the study participants are kept fully up to date. Regular news letters will provide detail on the progress of the study along with any other relevant updates.

Benefits of this request being approved include:-
• Newsletters are delivered to participants latest known addresses
• Where possible correspondence is not issued to deceased participants, causing distress to relatives
• Genomics England can ensure the information held is current, whilst taking about of relevant updates from the Genomics Medicine Centres

************************************************************************************************
The overall benefits realisation for the project are established by the Department of Health (DoH). Each individual research study will have their own specific aims and benefits that underpin the DoH benefits. The 10 key benefits have been drafted as:

1. It is anticipated that many of the circa 20,000 patients with rare diseases who provide their genomes for sequencing as part of the Project will receive a formal diagnosis for the first time.
2. The speed of processing the data from Whole Genome Sequences should be greatly increased with an associated acceleration of diagnosis – something that previously has taken years to identify, under the Project this should be possible in a few months.
3. It is hoped that Genomic diagnosis as a result of the Project will enable clinicians to make cancer treatment more personalised by determining how effective treatments like Herceptin or radiotherapy are likely to be. This will improve the effectiveness of treatments and may provide financial savings.
4. Although not all patients involved in the Project will benefit from a significant improvement in their own condition, for most the benefit will be in knowing that they will be helping people like them in the future.
5. The Project has already identified issues with the current approach for collecting DNA from cancer tumours. A current study within the Project is looking at identifying optimum methods for collecting DNA from cancer tumours. This is something which previously that has been incredibly difficult to do at scale and which is essential for high quality Whole Genome Sequencing.
7. As a result of the high standards of ethical practice and transparency underpinning the Project, the case will be made for collecting genomic data, linking it the phenotypic data and sharing it in a controlled way with academics, researchers and industry.
8. The creation of NHS Genomic Medicine Centres will allow engagement and feedback to patients with rare diseases and cancer from the Project and will provide the infrastructure to bring about transformational change in the NHS so that it continues to deliver world-leading healthcare in the future
8. As a result of the Project, the NHS and Public Health workforce will benefit from additional education in genomic medicine, including 550 places for an MSc in Genomics Medicine over the next 3 years, increased capacity in the scientific workforce, and a legacy of education and training in genomics for the future workforce.
9. The secure dataset of genomic and clinical data which is created as a result of the Project will enable clinicians, researchers and industry to discover new variants with a view to creating new diagnostics and treatments.
10. The Project will kick-start the development of the UK industry in Whole Genome Sequencing. The global genomics market was valued at an estimated £7.6 billion in 2013 and is expected to reach over £13 billion by 2018.

Outputs:

The list cleaning element -

Genomics England plan to issue a regular newsletter to participants. The first news letter will be sent out to participants once the list clean is complete and the database updated.
Subsequent newsletters will be sent up to 4 times a year.
***************************************************************************************************
All outputs from research environments will be anonymised. The outputs will relate to the purposes described above for each of the research areas. Proof of concept outputs will be produced during the summer of 2015, with a move to researcher created outputs during the Autumn of 2015 onwards. The specific outputs are defined by the research groups and then verified for being anonymous when an extract is requested.

Processing:

List clean only -

Once returned data has been validated within Genomics England, a check will be made with each GMC Clinical Director to ensure that any “local” knowledge, such as participants who have requested no further contact, can be removed from the database. Additionally there may be participants that have passed away but the family still wish to receive information, these cases will only be issued correspondence if notified by the GMC.

On receipt of information from GMC Clinical Directors, Genomics England will make any amendments required to the cohort database prior to the news letter being issued to participants.

Genomics England are planning to re-consent participants once updated consent materials have been drafted and approved.

**************************************************************************************************
Amendment - Genomics England has engaged the Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), at the University of Oxford to act as a Data Processor. The Data Processor will provide data handling services related to the acquisition and cleaning of registry based data provided by the HSCIC for the consented Genomics England participants. The University of Oxford will access identifiable record level data in the performance of this function.

The scope of data processing activities is limited. All data processing activities will be performed in the Genomics England Data Centre and no data will leave these servers. Genomics England will remain the Data Controller and remain responsible for all aspects of system security and access control.
Oxford will not access data remotely or take any data away from the Genomics England data centre.
There are three principle stages of processing:
1. Data acquisition, cleansing, quality verification, linkage and de-identification
2. Identification of participant cohorts that meet research scope parameters
3. Data analysis for research using de-identified data

The first stage focuses on the acquisition of data and quality verification to ensure it is complete, accurate and complies well with NHS data dictionary and other data standards that apply. The data is provided over a period of time (related to the treatment of participants) and associated with their longitudinal data from other NHS sources.
The intention over the course of the Project is to link this data with other data, such as primary, secondary, social and participant provided data. For this application the request is limited to HES Data.
The richness of the high quality data sets are crucial to the success of the 100,000 Genome Project in delivering value to the NHS. The evaluation of whole genome sequencing (WGS) data in the context of rich and extended phenotypes derived from electronic health records, such as blood pressure, cholesterol, glucose, and pharmacogenomics, adds significant value. The richness of the Project dataset will allow us to move beyond the primary phenotype of the rare disease, cancer or infectious disease that led to the patient’s enrolment to evaluate the WGS in the context of other continuous traits, diseases and response to therapy.
As soon as the data completeness and quality has been confirmed the data is de-identified as all subsequent processing can be performed without direct identifiers. This de-identification is a key facet of the 100,000 Genomes Project.

The second stage is focused on the confirmation and approval of valid research scope and selecting a de-identified cohort of participants that fulfil the focus of the research request. The Researchers will BE members of a Genomics England Clinical Interpretation Partnership (GECiP) or a GENE Consortia.
GECiP. The overall aim of the Genomics England Clinical Interpretation Partnership (GeCIP) is to create a thriving, sustainable environment for researchers and clinical (NHS) disease experts. The activities of GeCIP will inform NHS feedback to clinicians and the multidisciplinary teams by providing enhanced data interpretation, additional information on pathogenicity of variants, and functional characterisation.
GENE Consortia. Genomics England are running an Industry trial during the calendar year 2015.

12 pharma, biotech and diagnostics companies have committed to invest monetary and FTE resources to understand how best to realise the value from working with Genomics England, our Bioinformatics Platform Partners and the wider NHS.
Across the 100,000 Genome Project Genomics England will be at the forefront of Lifescience Programmes in the UK and Worldwide. For example Gene discovery in the 100,000 Genomes Project will create significant opportunities for scientific innovation and place particular emphasis upon national and international collaborations. Where possible, we will work with key international programmes including Development Disorders (DDD) and Orphanet, and complement the work of the International Rare Diseases Consortium (IRDC).

All research requests will be assessed to ensure they are included in the approved use purposes set out in the Genomics England Protocol and that it complies with the boundaries of the research group (Genomics England Clinical Interpretation Partnership or GENE consortia). Each research request will be for a sub-set of the de-identified data, with the specific data requirements specified in the request. The researchers also declare any data they wish to bring into the environment and any tools they wish to use for analysis.

The third stage is the research analysis of the de-identified approved data sets in the virtual data centre environments. Researchers perform all the analysis and processing within the environments hosted by Genomics England, they do not extract de-identified data. Researchers will use pre-declared data and tools to perform their analysis. If researchers want to extract any anonymised results data, they must first put any such results in a secure folder for anonymisation verification before it can be extracted.

A simplified view of the Genomics England Data Flow is shown below. Note the de-identified export boundaries into the Genomics England Core Research Repository

Genomics England provide the HSCIC with a cohort for linkage and they receive HES data from the HSCIC on a monthly basis. Every quarter Genomics provide an updated cohort to the HSCIC and the HSCIC provide the historical data for the extra cohort members The cohort is already flagged with the HSCIC so Genomics will only receive the historical data for the extra cohort members each quarter.

R26 - GENOMICS ENGLAND: GenOMICC COVID-19 Study — DARS-NIC-374190-D0N1M

Opt outs honoured: Yes - patient objections upheld, No - Statutory exemption to flow confidential data without consent, No (Excuses: Statutory exemption to flow confidential data without consent, Consent (Reasonable Expectation))

Legal basis: COPI Regs 2020, CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002, CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002; Health and Social Care Act 2012 - s261(5)(c), CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002; Health and Social Care Act 2012 - s261(5)(d), Consent (Reasonable Expectation); Health and Social Care Act 2012 s261(2)(c),

Purposes: Yes (Research)

Sensitive: Sensitive, and Non Sensitive, and Non-Sensitive

When:DSA runs 2020-07 – 2020-09 2020.07 — 2024.08.

Access method: Ongoing, One-Off

Data-controller type: GENOMICS ENGLAND

Sublicensing allowed: Yes

Datasets:

Secondary Uses Service Payment By Results Spells
Bridge file: Hospital Episode Statistics to Mental Health Minimum Data Set
Hospital Episode Statistics Critical Care
Emergency Care Data Set (ECDS)
Civil Registration - Deaths
Hospital Episode Statistics Admitted Patient Care
Demographics
Hospital Episode Statistics Accident and Emergency
Diagnostic Imaging Dataset
Bridge file: Hospital Episode Statistics to Diagnostic Imaging Dataset
Mental Health Services Data Set
Hospital Episode Statistics Outpatients
COVID-19 Second Generation Surveillance System (Beta version)
COVID-19 Hospitalization in England Surveillance System
Community Services Data Set
Cancer Registration Data
COVID-19 Second Generation Surveillance System
GPES Data for Pandemic Planning and Research (COVID-19)
HES-ID to MPS-ID HES Accident and Emergency
HES-ID to MPS-ID HES Admitted Patient Care
HES-ID to MPS-ID HES Outpatients
COVID-19 Vaccination Status
Civil Registrations of Death
Community Services Data Set (CSDS)
COVID-19 Second Generation Surveillance System (SGSS)
Diagnostic Imaging Data Set (DID)
Hospital Episode Statistics Accident and Emergency (HES A and E)
Hospital Episode Statistics Admitted Patient Care (HES APC)
Hospital Episode Statistics Critical Care (HES Critical Care)
Hospital Episode Statistics Outpatients (HES OP)
Mental Health Services Data Set (MHSDS)
COVID-19 General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR)
COVID-19 SGSS First Positives (Second Generation Surveillance System)

Type of data: Identifiable, Anonymised - ICO Code Compliant

Objectives:

This agreement is seeking approval to request data to support The GenOMICC - COVID Genomics UK (CoG-UK) partnership in researching Whole genome sequencing of patients severely affected by COVID-19. The work programme has sign off and prioritisation from the Chief Medical Officer for England

The goals of this work programme are set out below:

1. To harness world-leading UK healthcare and genomic infrastructure and systems to undertake prospective host whole genome sequencing at scale. This will elucidate the genetic architecture of host response to SARS-CoV-2 and identify opportunities to improve outcomes in the current pandemic, via international collaboration.

2. To identify rare and common variants that may affect susceptibility to response, identify novel opportunities for intervention and accelerate recovery.

3. To collect longitudinal life course datasets from primary care, hospital episodes, intensive care registries and outcomes via an extant partnership with NHS Digital and Health Data Research UK. We will include deep immune “omic” datasets on a subset of patients 9. This will allow case-control studies that capitalise upon unique UK assets, such as the 100,000 Genomes Project (97,000 people) and UK Biobank data sets (500,000 people) with WGS where more cases may be identified, including those with milder disease and unaffected people.

4. To use these rich data sets to understand the premorbid, concurrent and consequent sequelae of COVID-19 infection.

5. In partnership with the CoG-UK Viral Programme to evaluate the combination of viral and host genomics on outcomes to give pre-emptive insights into subsequent outbreaks and potentially future pandemics.

6. To provide access to these data sets via the Genomics England Trusted Research Environment to international and national academia and industry and facilitate international collaboration on COVID-19.

7. To link this to national COVID-19 clinical trials infrastructure offering potential for genomics to add value with insights into precision medicine and building a global-leading knowledgebase to enable better UK-wide and international capacity for future pandemic preparedness.

8. To engage and involve public and patients in setting strategy and priorities that shape the programme and it’s outputs. This will initially be based upon the 100,000 Genomes Project Participant Panel.

The prospective GenOMICC CoG-UK study

The variable response to COVID-19 suggests that, as with susceptibility to other infections, critical illness and mortality from COVID-19 may be determined by host genetic factors. From the 100,000 Genomes Project and the NIHR BioResource for Rare Disease it is known that rare variants cause immunodeficiency. By undertaking a prospective study design that leverages existing recruitment infrastructure in critical care, together with Genomics England, NHS England, Devolved Nations and PHE infrastructure this research will be able to apply the most advanced genomic testing to those most severely affected people admitted to hospital or intensive care.

The retrospective GenOMICC CoG-UK study:

The retrospective cohorts offer control arms for the study but also new case finding potential, particularly for people who have a milder clinical case. The study propose to harness the potential of two key national data assets. Firstly, analysis of the 100,000 Genomes Project data set which provides the genome sequences of 97,000 participants where they can use their longitudinal life course datasets to identify those affected by COVID-19, as well as providing appropriate unaffected controls. This dataset includes 627 families with rare immunodeficiency syndromes, which may allow insights to be accelerated because of co-existence of rare variants. Secondly, the UK Biobank cohort will provide 120,000 WGS this year, building to 500,000 WGS over the next 18 months from people, currently aged circa 55-85 years old, which is skewed towards the at-risk age groups for COVID-19 but will provide additional cases and controls, including mildly affected individuals.

Over the next five years these datasets will be enriched further by consented individuals from the NHS Genomic Medicine Service, and the new Genomics England programmes proposed in their strategic plan under consideration by Government. Furthermore, the Accelerating Detection of Disease Cohort will enrol up to 5 million people with genome-wide variants available with longitudinal life course data sets that could particularly add value to host response-variant associations and polygenic risk scores.

Genomics England Background:
Genomics England was established by the Department of Health to deliver the 100,000 Genomes Project. This followed the announcement in December 2012 by the Prime Minister of a programme of whole genome sequencing (WGS) as part of the UK Government’s Life Sciences Strategy. The principal objective of the 100,000 Genomes Project was to sequence 100,000 genomes from participants with cancer and rare disorders, and to link the sequence data to a standardised, extensible account of diagnosis, treatment, and outcomes gathered at recruitment, but primarily through the ongoing collection of medical records.

Data will be released into the Genomics England Trusted Research Environment where it will be linked with associated clinical data as well as additional data sources, which will include NHS Digital data requested through this agreement, COVID-19 testing feeds and viral genomics from Public Health England and data feeds from the Intensive Care National Audit Registry (ICNARC).

The Department of Health and Social Care have granted approval for Genomics England to procure a new, rapidly deployable Research Environment for the COVID-19 programme from existing core funding, which will provide a secure and collaborative workspace that enables researchers to perform COVID-19 genomic data analysis. The new Research Environment (COVID-RE) will provide an intuitive, integrated and collaborative user experience that enables effective COVID-19 research outcomes across a wide range of academic and Biotech/Pharma researchers with varying levels of technical competency. This offers a major upgrade to our current environment and will comprise user-centric, contemporary bioinformatic workflows, support opensource tooling, and enable shared workspaces between Genomics England and Partners. COVID-RE must serve the immediate COVID-19 research effort and may also advance the transformation of Genomics England’s platform infrastructure, which is a key enabler to the research community.

There are two key value streams provided by the COVID-RE. Firstly ‘raw data to analytics ready data’ stream which must permit the collection of data from multiple locations from unstructured, through semi and fully structured forms and transforms them the appropriate data model, data store based on their ongoing use and availability. Secondly the ‘discovery to insight’ stream that supports researchers by providing the capability and the framework to support their User Journey from understanding the data available to them through to providing the necessary analytics and publishing tools.

The newly established COVID-RE will provide the following:
• Seamless interface with cloud storage and compute capabilities.
• A unified data platform containing datastores appropriate for all the required clinical and genomic data types.
• Data integration capability to deliver analytics-ready datasets into domain-specific data stores across batch and streaming integration patterns.
• Standards-based access services providing secure, fast, flexible, robust and auditable access to TRE data assets.
• Applications to support the research user journey from an exploration of data sets, through cohort building, analysis and publication - providing tools appropriate to a variety of user requirements (ways of working) and programming competency.
• Applications and workflows can be delivered natively within the platform, however COVID-RE must enable access to container-based applications and, via a set of hardened APIs, to a relevant external service (as long as security is maintained).

Details of the sublicence model via the Genomics England Trusted Research Environment are supplied within the processing activities section of this agreement.

The additional clinical data is key for researchers to be able to understand and infer clinically relevant and actionable findings. The aim is to provide a high quality, diverse clinical dataset, detailing each participant’s journey and to understand pre-existing conditions and also the early behaviours in the disease course. Genomics England currently have an agreement to receive NHS Digital data for the extant 100,000 Genomes Project participants (DARS-NIC-12784) and have seen first-hand the depth and quality of the data and how it has aided researchers.

To that aim, the significant gap in the current data collection for the COVID-19 project can be addressed by the non-standard NHS Digital data feed.

The SUS APC feed would provide as close to real-time data for researchers and in the current climate is vital to ensure no delays.

The associated COVID-19 data-sets (including NHS111 and CV19 Testing Data in particular which will be requested under a future version of this agreement) will add to the detail in patient journey course, flagging for instance how participants were monitoring their symptoms and was there an associated poor prognosis.

The group of survivors eligible for recruitment for this study are generally healthy individuals who have suffered critical illness. It is anticipated this cohort will grow to approx 36,000 participants.

The data would be available for analysis alongside the extant Genomics England data set of 100,000 Genomes Project participants and would be made available to approved researchers worldwide as per existing governance procedures. This 100,000 Genomes Project data set will act as an appropriate matched control group. Data on the 100,000 Genomes Cohort will be provided to NHS Digital for linkage - (as they will act as a control cohort), as well as new additions for the GenOMICC Study.

The below detail pertains to the background of the study and susceptibility to infection sets out why Genomics are investigating. The origin of the GenOMICC study was focused on smaller cohorts of participants. However, in collaboration with the COG-UK group, this is now expanded to whole genome sequence of approx. 36,000 affected individuals as set out above.

GenOMICC Study Background:
Susceptibility to infection is profoundly heritable (Sorensen et al. 1988). Patients who develop life-threatening illness following infection with usually innocuous pathogens, such as influenza (Miller et al. 2010), are genetically different from the rest of the population (Albright et al. 2008). Understanding the genetic mechanisms of susceptibility may yield new therapeutic targets (Baillie 2014) that can be used to make susceptible patients more similar to individuals who are resistant to, or tolerant of, specific pathogens.

The genetic mechanisms of susceptibility to infection are likely to be highly pathogen-specific and may even have opposing roles in different infections (as for CCR5 variants in HIV (Huang et al. 1996) and WNV (Glass et al. 2006) infection). Pathogen-specific interventions (e.g. small molecules to inhibit an enzyme or receptor that is dysfunctional in resistant individuals) would therefore be protective to the host in a similar way to antibiotics, with the advantage that it is conceptually more difficult for any one pathogen to evolve resistance to such a therapy.

A second, more challenging problem arises in patients who become critically ill following infection. The patterns of immune-mediated organ dysfunction, immunoparesis, and death are very similar in severe infections and sterile systemic injuries (such as burns, haemorrhage, pancreatitis and trauma). Ultimately, death is a consequence of the host response to injury (Angus and Poll 2013), through final common pathways of organ failure that are clinically and biochemically evident, and unrelated to the original precipitant.

Broadly, the severity of critical illness follows directly from the severity and duration of the initial insult. In bacterial sepsis, early antibiotics are the mainstay of therapy; in influenza, early antivirals; in haemorrhage, early resuscitation; in trauma, urgent action to prevent secondary injury. There are no therapies with which to modulate the host response to systemic injury.

There is a lack of direct evidence of heritability for outcomes of critical illness, due in part to difficulties in defining and quantifying the heterogeneous multi-organ dysfunction syndrome (MODS), and in part due to the rapid pace of change in critical care medicine, making it impossible to tackle this question in long term outcome studies. However, clinical and biological evidence support the hypothesis that the pathogenesis of MODS is immune in origin (Angus and Poll 2013). Hence, predictions can be made from the extensive knowledge of other immune conditions. Whether or not MODS is considered to be an autoimmune or infectious condition is moot: these conditions share a great deal of similarity in genetic predispositions, cell types and mechanisms of pathogenesis. It is therefore very likely that propensity to survive MODS has a heritable component, and there is some direct evidence in support of this hypothesis (Rautanen et al. 2015). If this is the case, then the identity of the specific variants that contribute to outcome could potentially be utilised to design therapies to promote survival after the onset of MODS.

This study aims to identify genetic predisposition to specific syndromes of critical illness. Specifically, susceptibility to life-threatening infections caused by an identified pathogen, and susceptibility to death following the onset of organ failure due to sepsis or sterile injury. In order to maximise the probability of identifying host genetic loci associated with susceptibility, Genomics England will restrict some analyses to younger individuals in good general health and lacking in known predisposing factors.

The same principle was used to determine an upper age limit for inclusion for some analyses. With advancing age, there is an increase in undiagnosed co-morbidity, frailty, and susceptibility to serious complications of infection or critical injury. There is therefore an increase in the probability of susceptibility to, and mortality from, critical illness that is consequent upon non-genetic factors.

Withdrawal:
Participants are freely able to decline participation in this study or to withdraw from participation at any point without suffering any implied or explicit disadvantage. All patients will be treated according to standard practice regardless of whether they participate.
The following options of withdrawal will be made available to participants:
1. Partial withdrawal. Data WILL continue to be updated and used for research, but no further contact will be made with the participant
2. Full withdrawal.
• no further contact will be made with the participant;
• data will not be updated from health records;
• data will not be removed from research that is underway or has already been done, and an audit record will be maintained to confirm participation.

Consent version and lifecourse follow-up
The first COVID positive patient was recruited to the GenOMICC study in March 2020. All patients (c.1500) currently recruited to the GenOMICC study are on consent and Protocol version 1.08 (which allows for COVID research but not longitudinal life course follow up). This Protocol v2.1 (submitted as part of this DARS request) was REC approved on 23rd March 2020, IRAS IDs are: 269326 & 189676 (https://www.hra.nhs.uk/covid-19-research/approved-covid-19-research/269326/). Genomics England will attempt to reconsent all patients recruited under version 1.08 onto the newly amended materials (version 2.1 – submitted as part of this DARS application) which allows for longitudinal lifecourse follow-up. For any patients on the v1.08 Protocol for which Genomics England are not able to seek consent, Genomics England will not be requesting their data for linkage. For patients prospectively recruited onto the v2.1 Protocol and consent materials, Genomics England will be requesting their data for linkage and follow-up .

Datasets requested from NHS Digital:
Consideration has been be given to assess and ensure that access to the data sets requested are inline with COVID-19 related purpose, in terms of the restrictions set out in Reg 3(1) COPI, below are details of what each of the data sets will at a high level provide insight to.

Hospital Episode Statistics: Outpatients, Admitted Patient Care, Critical Care and Accident & Emergency/ ECDS. These datasets provide the core clinical data for participants and are vital to the provision of a detailed medical history for participants.

Diagnostic Imaging Dataset. This provides invaluable, detailed information to build on participants' phenotypes, e.g. tumour size and spread in cancer, adding to the understanding of patients' histories on individual and cohort level and their relationship with genomic alterations.

Secondary Uses Service datasets. The minimal latency in availability of these datasets is highly desirable for the research objectives set out in this project.

Mental Health Data sets: The 100,000 Genomes Project includes recruitment of psychiatric diseases and others with mental health phenotypes: intellectual disability and seizures are some of the most prevalent conditions within the Project. To-date nearly 10% of project participants have a mental health record. Mental health data are therefore vital in ensuring that a complete and relevant medical history is available for all participants.

Cancer Registration Data sets: To see the incidence of cancer within the cohort

Mortality data are essential for performing survival analyses and as a metric for success of medical care: this is crucial information for research in combination with other medical history. Cause of death information is vital in order to determine if mortality is related to the primary disease of a participant or to highlight unforeseen trends. Knowledge of participant death is also vital for the correct analysis of medical timeline data and for the management of participant cohorts.

COVID datasets: These datasets, including SGSS (Second Generation Surveillance System Data Set) and CHESS (COVID-19 Hospitalization in England Surveillance System) will be crucial to identify early Prognostic features in those affected with Coronavirus.

Assessment of the datasets has been undertaken and NHS Digital are satisfied that they are necessary for the COVID-19 work being undertaken. Genomics have confirmed that all research which is approved from the GenoMICC study using the data for the COVID-19 specific purposes will be published here https://www.genomicsengland.co.uk/about-gecip/research-2/

Genomics England Industry access:
Genomics England works with industry through its Discovery Forum. The Forum provides a platform for collaboration and engagement between Genomics England, industry partners, academia, the NHS and the wider UK genomics landscape.

Industry partners comprise pharmaceutical, biotech and diagnostic companies, and those specialising in laboratory and data analysis. These companies have joined the Forum to work in a pre-competitive environment with access to a selection of genomic and associated clinical data. Ultimately, the Discovery Forum aims to help turn research findings into treatments, diagnostics and benefits for patients as soon as possible.

As the Discovery Forum is a collaborative venture, no fees are levied on participating organisations to access the COVID data, however they are charged based on storage and compute, such as running their own bioinformatics pipeline. All members of the Forum are obliged to publish all findings and research at the point at which intellectual property for any product is protected. Participants in the 100,000 Genomes Project have been asked explicitly to give consent for commercial companies to access their de-identified genome and health data.

The Forum was created in July 2017 and allows industrial partners to report back to Genomics England on what aspects of the data are proving to be most useful to their research studies, what data is missing and how the data should be collected and developed further so it is captures what industry needs, in a format that is compatible with their research and data systems. These partners act as a 'critical friend' and have made many helpful suggestions to increase the likelihood of successful research in the future for all those using Genomics England's landmark data set.

The lawful basis for processing Participant Data under the General Data Protection:
Regulation (GDPR) used by Genomics England is legitimate interests as set out under Article 6(1)(f) of the GDPR. It is necessary for Genomics England to process Participant Data for its legitimate interests in carrying out medical research and in providing reports used by clinicians in their care of Participants.
It is necessary for Genomics England to process Participant Data for its legitimate interests in carrying out medical research and in providing reports used by clinicians in their care of Participants.
The processing is necessary to support and enable Genomics England's legitimate interests in enabling new medical research on using genomics in health care, and on the causes, diagnosis and treatment of COVID-19.

Patients and the public will be at the heart of this programme. Initially the researchers will involve the extant 35 strong Genomics England Participant Panel and then we will add others who have been affected by COVID-19 at a later point. These participants and members of the public will be represented on all committees and working groups and will also meet separately.

The beneficiaries are:
o Participants - through the work Genomics England do will ultimately influence their care;
o researchers and industry - by giving them access to a unique ground-breaking resource of genomic data combined with life-course clinical data;
o and the wider public - by accelerating the uptake of genomic medicine making it available to patients in the UK.

The lawful basis for the release and use of the confidential data being shared under this version of the agreement is Regulation 3(4) of the National Health Service (Control of Patient Information Regulations) 2002 (COPI) to require NHS Digital to share confidential patient information with organisations entitled to process this under COPI for COVID-19 purposes. The application of this has been based on the information provided in the Whole genome sequencing of patients severely affected by COVID-19 funding proposal from The GenOMICC - COVID Genomics UK (CoG-UK) partnership which was supported by the CMO of England and the CFO of the DHSC.

Yielded Benefits:

The 100,000 genomes project has been hugely successful and provided numerous academic and clinical publications and discoveries. The success of the project has been based on the strength of the clinical data provided by NHS Digital. Understanding this significant value is why Genomics England are so keen to add NHS Digital data to its clinical data source for the GenOMICC study. The GenOMICC study is very much in its infancy and whole genome sequencing has only begun in the last month. It is thus too early to demonstrate any significant outcomes. These outcomes will only gain power and relevance with more prospectively recruited patients and a breadth and depth of clinical data. By providing this to researchers, Genomics England can provide them the necessary tools to explore the genomic data.

Expected Benefits:

Access to the data will enable the research to discover new rare and common variants alongside new multi-omic biomarkers that underpin host response to infection, allow investigation of the impact of viral genomic features on outcomes and allow creation of a polygenic risk score, which may detect risk of severe response to similar viruses. The prospective component could allow nested clinical trials or case-control resources to add value to this study by detecting variants, which stratify response or predict outcomes. Although the 100,000 Genomes Project and the Genomic Medicine Service may include participants biased to specific disease ascertainment, the scale of these resources and the presence of parents helps compensate for this problem.

Specifically the short term (6 months) and medium term benefits and outcomes from this programme of research anticipated are;

• Variants enable Polygenic Risk Score to predict greatest risk and avoid ITU
• Pre-morbid clinical conditions or biomarkers of risk and rapid NHS uptake to avoid ITU
• Identify novel therapies or precision interventions for rapid national trials
• Longitudinal life course sequel of COVID-19 for pandemic planning

• Patient benefit:
o Providing improved clinical understanding of disease progression in COVID-19
o Correlation to disease progression and pre-morbid status
o Identification of susceptibility genes
o Develop a biomarker test(s) to predict an individual’s response to SARS-CoV-2 exposure, considering both COVID-19 severity and vulnerability to infection.
o Identify targets that can be used in to inform development of new treatments

• New scientific insights and discovery:
o with the consent of patients, creating a database of 35,000 whole genome sequences linked to continually updated long term patient health and personal information for analysis by researchers.
o Correlation of host and viral genomic data
o Potential to provide improved testing for future pandemics
o Aide researchers to identify novel targets for vaccines and therapy
o Identification of highly penetrant rare variants in genes and pathways relating to viral susceptibility or immunodeficiency.
o Genome wide association studies (GWAS) using common variants to identify genes and pathways associated with viral response. These analyses will be aligned with other COVID-19 research consortia.
o Rare variant burden analysis to identify genes and pathways enriched in rare variants associated with viral response

• Accelerating the uptake of genomic medicine in the NHS: working with NHSE and other partners to deliver a scale-able WGS and informatics platform to enable these services to be made widely available for NHS patients. WGS could potentially provide the most accurate diagnostic test for COVID 19.

• Stimulating and enhancing UK industry and investment: by providing access to this unique data resource by industry for the purpose of developing new knowledge, methods of analysis, medicines, diagnostics and devices.

• Increasing public knowledge and support for genomic medicine: delivering an ethical and transparent programme which has public trust and confidence and working with a range of partners to increase knowledge of genomics.

Yielded Benefits:
The 100,000 genomes project has been hugely successful and provided numerous academic and clinical publications and discoveries. The success of the project has been based on the strength of the clinical data provided by NHS Digital. Understanding this significant value is why Genomics England are so keen to add NHS Digital data to its clinical data source for the GenOMICC study.

The GenOMICC study is very much in its infancy and whole genome sequencing has only begun in the last month. It is thus too early to demonstrate any significant outcomes. These outcomes will only gain power and relevance with more prospectively recruited patients and a breadth and depth of clinical data. By providing this to researchers, Genomics England can provide them the necessary tools to explore the genomic data.

Outputs:

Genomics England completed sequencing 100,000 genomes at the end of 2018 (https://www.newscientist.com/article/2187499-uk-dna-project-hits-major-milestone-with-100000- genomes-sequenced/). During 2018, the Genomics England Research Environment was established to allow research access to de-identified genomic and clinical data received from NHS Digital. Thirty disease and cross-cutting GeCIP research domains were requested and approved, with now over 3000 GeCIP members given access to the Research Environment. Genomics England had also created the industry Discovery Forum to provide a platform for collaboration and engagement between Genomics England, industry partners, academia, the NHS and the wider UK genomics landscape.

Although the 100,000 genomes project has completed recruitment, Genomics England is committed to continue gathering life-long clinical data from the participants and making these available in the Research Environment.

Genomics England will be responsible for the onward workflow, in partnership with Illumina for the delivery of 30X whole genome sequences, subject to passing appropriate sequence QC, into the Genomics England data centre. Alignment and variant calling will be performed alongside the potential application of bespoke immunodeficiency panels as part of the Genomics England bioinformatics pipeline analysis.

Genomic data will be released into the Genomics England Trusted Research Environment where it will be linked with associated clinical data.

The GenOMICC study is backed by £28 million from Genomics England, UK Research and Innovation, the Department of Health and Social Care and the National Institute for Health Research. Illumina will sequence all 35,000 genomes and share some of the cost via an in-kind contribution.

A press release on 13/05/20 included a comment from Health and Social Care Secretary Matt Hancock: “As each day passes, we are learning more about this virus, and understanding how genetic makeup may influence how people react to it is a critical piece of the jigsaw.
“This is a ground-breaking and far-reaching study which will harness the UK’s world-leading genomics science to improve treatments and ultimately save lives across the world.” To date, nearly 3000 patients have been recruited into the project.

As of March 2020, the Genomics England Research Environment contained 107,694 genomes, of which, 33,461 were cancer and 74,233 were rare diseases.

The Research environment also contained clinical data on 89,157 participants (this is because cancer participants have two genomes submitted). The clinical data for 17,246 cancer participants includes clinical data from NHS Digital (HES OP/APC/ CC and AE) but also cancer specific data from Public HeaLth England Cancer Registry (NCRAS). The combination of clinical data for all 100,000 participants totals about 5m records.

As the GenOMICC study prospectively recruits participants, the aim will be to use the existing 100,000 participants and age and match-ranked controls for those entered into the study. As Genomics England prospectively enrolls more participants into the study, Genomics England plans further releases of genomic and clinical data, including clinical data received from NHS Digital and viral and host genomic data, into the Research Environment on the following dates, in order to continue support for, and to further develop, this ground-breaking resource:

• 3rd August 2020
• 7th September 2020
• 5th October 2020
• 2nd November 2020

Specific outputs over the period of this agreement are therefore to release updated genomic and clinical data for the 100,000 genomes participants and GenOMICC participants into the Research Environment on the dates shown above.

Processing:

All organisations party to this agreement must comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by "Personnel" (as defined within the Data Sharing Framework Contract i.e.: employees, agents and contractors of the Data Recipient who may have access to that data).

Genomics England provide NHS Digital with a cohort for linkage and they receive data from NHS Digital on a monthly basis. Every month, Genomics England provide an updated cohort to NHS Digital who provide the historical data for the extra cohort members. The cohort is already flagged with NHS Digital so Genomics England will only receive the historical data for the extra cohort members each month.

The first stage of processing focuses on quality verification. This ensures that the data set is complete, accurate and complies with the NHS data dictionary or relevant specification. Participant identifiers in the dataset are verified against Genomics England's participant details and any updates required to identifiable data fields, e.g. dates of birth, are highlighted. Finally, the data set is reviewed against recent participant withdrawals so that any withdrawals notified after the data application was made can be removed from the data sets.

Following this the data are de-identified, as all subsequent processing can be performed without direct identifiers. Genomics England has compiled lists of identifiable and sensitive fields for each data set in line with details provided by NHS Digital and following internal review of data sets. De-identification is a key facet of the Genomics England resource. De-identified data are uploaded to a secure research environment hosted by Genomics England on a monthly basis, where they are linked to participant genomes and primary clinical data.

The second stage of processing involves the selection of a de-identified cohort of participants that fulfil a specific research request. Researchers are members of a Genomics England Clinical Interpretation Partnership (GeCIP) or the Discovery Forum. Research requests are assessed to ensure that they are included in the approved use purposes set out in the Genomics England Protocol, and fall within the scope of the relevant GeCIP or the Discovery Forum. Researchers declare any data they wish to bring into the research environment and any tools they wish to use for analysis.

The third stage of processing is the analysis of the de-identified data sets within the research environment. Researchers perform all the analysis and processing within the environment: they do not extract de-identified data. Results data are placed in a secure folder for anonymisation verification before extraction.

There will be no data linkage undertaken with NHS Digital data provided under this agreement that is not already noted in the agreement.

The Research Environment (TRE):
All research analysis on the Genomics England dataset will only be carried out via a secure analysis environment hosted within the Genomics England data center - the Genomics England Research Environment. Analytical tools and applications are available within the Research Environment. No sequencing or clinical data are made available for download, users cannot copy or paste out of the Research Environment, and there is limited internet access within it (i.e. whitelisted sites). Movement of files into and out of the Research Environment is governed via an 'Airlock' Policy.

Academic researcher access to the Research Environment:
Academic researchers access the Research Environment by applying to be a member of a Genomics England Clinical Interpretation Partnership (GeCIP) domain. GeCIP membership is open to any individual, student or member of staff, who is affiliated with a host institution which include the following:

o UK academic research institutions (e.g. universities, research institutions etc.)
o NHS trusts or authorities
o UK and foreign charitable organisations directly related to the focus of the 100,000 Genomes Project
o Foreign universities and research institutions that carry out significant research activity
o UK and foreign governmental departments that carry out significant research activity (e.g. MRC, NIH, PHE)
o Foreign healthcare organisations (private or public) that undertake significant research activity Membership is not open to those who are self-employed or employed by:
o private UK healthcare institutions
o commercial companies.
o To be eligible for data access as a GeCIP member, applicants must meet these requirements:
o Their host institution has signed a GeCIP Participation Agreement, which outlines the key principles that members of each institution must adhere to, including the Intellectual Property and Publication Policy.
o Their host institution has verified that they are affiliated with that institution.
o The applicant's GeCIP domain has submitted a detailed research plan and it has been approved by the Genomics England Access Review Committee (see below).
o The GeCIP domain lead has approved the application.

Following approval, GeCIP researchers must sign a specific agreement ('GeCIP rules' which is attached) covering their behavior and working practice within the data infrastructure. Data access will not then be granted until a researcher has successfully passed mandatory information governance training.

Commercial researcher access to the research environment:
Genomics England operates a membership-based forum - the Discovery Forum - which is open to a range of companies world-wide and allows access to the Research Environment. It provides a platform for collaboration between Genomics England, industry partners, academia, the NHS and the wider UK genomics landscape.

Each Discovery Forum member signs a Data Access Agreement with Genomics England. This states the research purposes which the company is authorised to carry out and stipulates the number of genomes sequences that can be accessed. It covers the Company's behavior and working practices: in particular it binds users to Genomics England's Airlock Policy, Information Governance, IT Security and Data Protection Polices. Companies need to nominate named individuals to be their Researchers who must complete information governance training before accessing data. Once the Data Access Agreement is in place, each research project undertaken by the Company within the Research Environment must receive prior ARC approval.

Discovery Forum members access the Research Environment in a similar manner to GeCIP Researchers: all research is carried out within the Research Environment, and any movement of results out of the environment occurs only through the Airlock Process.

The Access Review Committee:
The Access Review Committee (ARC) provides an independent examination of requests for data access, with regards to the acceptable uses of the Genomics England dataset which are outlined in The National Genomic Research Library Protocol the 100,000 Genomes Project Protocol and Data Access and Acceptable Uses Policy. The ARC comprises external scientific experts, patient representatives and members of Genomics England's Participant Panel which is made up of participants and parents/carers involved in the 100,000 Genomes Project.

The Airlock Process:
The Genomics England Research Environment has been developed with the intention that all data analysis is carried out within it and that the only data to leave it are analytical results. An Airlock process has been established which enables material (data, files, tools etc.) to be moved in or out of the Research Environment in a controlled and supervised manner; facilitating research and discovery, while maintaining control of security and access. Removal of results therefore requires an Airlock request.

The following rules are applied to all Airlock requests:
o All relevant details of the files to be transferred must be provided with every request.
o All files transferred may be checked by Genomics England to ensure compliance with the relevant policies. Users will be notified of any files rejected along with the reason for the rejection.
o All files transferred will be checked for viruses and malware and those failing this test will be rejected. It is the responsibility of the requestors to resolve such issues before re-submitting the file for transfer.
o Files requested for transfer are assessed using the following criteria:
• whether the request aligns with the user's ARC approval
• whether the request can clearly be demonstrated to be aligned with a registered project in the Research Environment
• any data security implications
• any disclosure risks
• the technical feasibility and associated cost of the request
• when importing data, its scientific value to the community of researchers within the Research Environment, and when and how it will be shared
• when importing data, checks will be performed to ensure that the data importer owns the data and holds the correct consents and approvals.

The Airlock process is governed by the Airlock Policy (attached), which defines the process and governance of the Airlock process. A set of Airlock Policy Guidelines presents the rules-of-thumb/principles that will be referenced by both the researcher (during preparation of analysis results) and the output checker (during output-checking).

Analysed results are inspected to ensure they cannot be used to disclose the identity of the participant. Checking of statistical output by the Airlock Review Team is governed by a generalizable set of principles that guide individual decisions and ensure flexible evaluation of the Genomics England dataset. By using a principles-based approach where each case is assessed individually the security of the dataset is maintained by exporting only 'safe' data. Review of transfer requests resulting in public-sharing/publication of data will be checked more stringently. Any approved Airlock export can only be used for the specific use detailed in the original export.

The Research Environment contains External Data (for example Hospital Episodes Statistics [HES]) which is subject to data sharing framework contracts and data sharing agreements between Genomics England and other parties that dictate how the data may be used and what can be exported. Where an export contains External Data, Genomics England will always apply the requirements placed on them as conditions of having access to the data. In some cases, particularly concerning the export of individual-level data, these will be more conservative than those applied to 100,000 Genomes Data alone.

The Airlock Review Team is a delegation of the Genomics England Chief Scientist responsible for oversight of all airlock requests in accordance with the Airlock Policy and the group's Terms of Reference. It comprises:
o Senior Information Risk Office (SIRO)
o Technical Lead
o User Community Representative
o Bioinformatics Director
o Caldicott Guardian
o Chief Scientist

Sub-licencing:
Genomics England has developed the Research Environment to allow registered third parties to access pseudonymised versions of the data that it holds, for the purposes of approved research. The Research Environment contains External Data (for example Hospital Episodes Statistics [HES]) which is subject to data sharing framework contracts and data sharing agreements between Genomics England and other parties that dictate how the data may be accessed. Genomics England will always apply the requirements placed on them as holders of External Data to users of the Research Environment as a condition of having access to the data. The data is NOT for onward sharing outside of the Research Environment.

Data control:
For clarity, the University of Edinburgh is responsible for acquisition of primary clinical data. That relates to data acquired at patient registration.
Genomics England in its provision of whole genome sequencing are applying to NHS Digital for secondary clinical data to link to the genomic data. With regards to data provided by NHS Digital, Genomics England are the sole data controller. To this end, staff and academics from the University of Edinburgh are required to join a GeCIP to access the secondary clinical data from NHS Digital within the Research Environment.

Genomics England provides NHS Digital with linking data in order to receive longtitudinal data sets. These data sets are delivered to Genomics England by NHS Digital on a monthly basis having been approved by the NHS Digital IGARD. Genomics England identifies the linking data and agrees with NHS Digital the scope of the longtitunidal data being provided. Genomics England determines the method of de-identification and storage within the research environment and secures this data for use by approved researchers only. Genomics England determines who these researchers are.

Genomics England is the Data Controller for longitudinal data sets processed in the Genomics England Research Library.

Researchers in academic, educational or commercial organisations
Access to deidentified data in the research environment which will include longitudinal data sets (HES etc) provided by NHS Digital. Access to the Research Environment only allowed under access agreement.
The individual researchers are Data Controllers when carrying out research within the research environment. Data disseminated under this agreement for COVID-19 purposes will be restricted to the GEL Covid research environment. Only COVID-19 research approved studies will be granted access to the data. All research which is granted access for COVID-19 purposes must be employed or engaged for the purposes of the health service as the request for data is to support research that has been set as a priority by the CMO. Research which is approved using the data for the COVID-19 specific purposes will be published here https://www.genomicsengland.co.uk/about-gecip/research-2/

Data Processors:
o Only summary level data can be removed from the environment.
o Approved researchers will only be able to access Lifebit’s PaaS CloudOS through a virtual desktop.
o Secondary data will be ingested into CloudOS.
o CloudOS will be hosted within GEL’s London AWS (Amazon Web Services) environment - All data is encrypted in transit and at rest.
o CloudOS controls access to the secondary data.
o A security and DPIA assessment will be conducted prior to loading live

Lifebit:
Lifebit has been selected as platform partner to deliver the Research Environment after reviewing several proposals. The UK-based SME offered a proven and innovative technology solution offering a blend of robustness and ease of use. Lifebit CloudOS provides a secure and collaborative workspace to enable researchers to easily perform COVID-19 genomic data analysis. The platform will deliver an intuitive, integrated and collaborative user experience and enables fast, effective COVID-19 research outcomes across a wide range of academic and biotech/pharma researchers with varying levels of technical competency.

Data Minimisation:
This will be limited to the selected cohorts and additions and deletions will be updated regularly.

Cohort Size:
Briefly, there will be effectively 2 cohorts, one is the 100K Project (CONTROL COHORT) which is about 92,000 participants. This will be a near enough static list.
The second cohort is the covid-19 recruited cohort participants.