NHS Digital Data Release Register - reformatted

Barts And The London School Of Medicine And Dentistry projects

MR1250 - Survival of babies with trisomy 13 or trisomy 18 born in England and Wales since 2004
Genes and Health

156 data files in total were disseminated unsafely (information about files used safely is missing for TRE/"system access" projects).

MR1250 - Survival of babies with trisomy 13 or trisomy 18 born in England and Wales since 2004 — DARS-NIC-148194-QYCWF

Opt outs honoured:

Legal basis: , Approved researcher accreditation under section 39(4)(i) and 39(5) of the Statistical Registration Service Act 2007; National Health Service Act 2006 - s251 - 'Control of patient information'., Approved researcher accreditation under section 39(4)(i) and 39(5) of the Statistical Registration Service Act 2007 ; National Health Service Act 2006 - s251 - 'Control of patient information'.

Purposes: No (Academic)

Sensitive: Sensitive, and Non-Sensitive

When:DSA runs 2011-07 – 2026-07

Access method: One-Off

Data-controller type: QUEEN MARY UNIVERSITY OF LONDON

Sublicensing allowed: No

Datasets:

MRIS - Cause of Death Report
MRIS - Flagging Current Status Report
MRIS - Personal Demographics Service

Type of data: Identifiable

Objectives:

To determine the current life expectancy of newborns with trisomy 13 or trisomy 18 in England and Wales.

Genes and Health — DARS-NIC-338864-B3Z3J

Opt outs honoured: No (Excuses: Consent (Reasonable Expectation))

Legal basis: Health and Social Care Act 2012 s261(2)(c)

Purposes: Yes (Academic)

Sensitive: Sensitive, and Non-Sensitive

When:DSA runs 2021-07 – 2022-07 2021.10 — 2025.04.

Access method: One-Off

Data-controller type: QUEEN MARY UNIVERSITY OF LONDON

Sublicensing allowed: Yes

AGD/predecessor discussions: AGD minutes - 3 April 2025 final.pdf, AGD minutes - 18 May 2023 final.pdf, IGARD Minutes - 6th May 2021 final.pdf, igardminutes-29thoctober2020final.pdf, IGARD Minutes - 15 July 2021 FINAL.pdf

Datasets:

Bridge file: Hospital Episode Statistics to Mental Health Minimum Data Set
Civil Registration - Deaths
Emergency Care Data Set (ECDS)
Hospital Episode Statistics Accident and Emergency
Hospital Episode Statistics Admitted Patient Care
Hospital Episode Statistics Critical Care
Hospital Episode Statistics Outpatients
Mental Health and Learning Disabilities Data Set
Mental Health Minimum Data Set
Mental Health Services Data Set
MSDS (Maternity Services Data Set)
National Diabetes Audit
MSDS (Maternity Services Data Set) v1.5
Civil Registrations of Death
Hospital Episode Statistics Accident and Emergency (HES A and E)
Hospital Episode Statistics Admitted Patient Care (HES APC)
Hospital Episode Statistics Critical Care (HES Critical Care)
Hospital Episode Statistics Outpatients (HES OP)
Maternity Services Data Set (MSDS) v1.5
Mental Health and Learning Disabilities Data Set (MHLDDS)
Mental Health Minimum Data Set (MHMDS)
Mental Health Services Data Set (MHSDS)
Cancer Registration Data
Community Services Data Set (CSDS)
Demographics
Improving Access to Psychological Therapies (IAPT) v1.5
Improving Access to Psychological Therapies (IAPT) v2
Patient Reported Outcome Measures (Linkable to HES)

Type of data: Anonymised - ICO Code Compliant, Identifiable

Expected Benefits:

It is anticipated that the potential impact of this research will be broad, and cover a range of domains, including academic, health policy and economic impact, as well as direct impact on the individuals contributing to the research through public engagement. Dissemination will largely bring benefit through the acquisition of new knowledge relating to health and disease in a previously under-represented and understudied population. New interdisciplinary knowledge will be derived from the combined analysis of health and genomic data on a population scale. Detailed health data from NHS Digital will allow Genes & Health users to generate this knowledge for an otherwise under-studied British South Asian population who experience high rates of, and premature, disease. This new knowledge will be highly complementary to other major bioresources, e.g. UK Biobank and the Clinical Practice Research Datalink, and the availability of linked data curated across both will support high quality replication and validation studies thereby maximising outputs. The use of routine health data in this work will support direct and feasible translation of findings back to clinical care in the NHS Additionally, the research will deliver new methodological insights related to the integration of health data science and genomics that could have direct impact to the academic community, including through the training of junior academic researchers.

Dissemination of new knowledge arising from the Genes & Health bioresource is likely to benefit health and social care through the development of new treatments and risk prediction strategies, as well as finding new causes of disease or its complications. For example, Genes & Health has already generated new knowledge with the potential to benefit health care through its recent work combining information from a rare genetic variant (in the HAO1 gene) with health data (from local health data sources). Dissemination of study findings related to this gene function through academic collaboration has supported critical drug development for a rare, life-threatening metabolic disorder (primary hyperoxaluria). Identification of the effects of genetics on health (including rare gene variants, or gene differences associated with parental relatedness) could improve clinical care through better genetic counselling to families at risk and identification of risk early in the lifecourse.

Longer-term, Genes & Health anticipate that the research will support policy-level improvements in health care with potential economic impact, e.g. through targeting prevention strategies to those at the highest risk, delivering more effective treatments, and reducing health inequalities.

Sharing new knowledge arising from Genes & Health is likely to benefit the public through raised awareness and understanding of health and disease, and improving access and availability to effective treatment. These benefits will relate to British Bangladeshi and British Pakistanis, and may be generalisable to other south Asian groups in the UK or globally. Genes & Health community partners and advisory group will have an active role in dissemination of results to ensure that they are effectively and sensitively disseminated to the communities the study represents.

All research projects using the Genes & Health bioresource go through an approval process to ensure that their scope is within the remit of the Genes & Health research programme and its objectives. Genes & Health will monitor the outputs of research involving the bioresource via annual reporting to the Executive group.
This monitoring will include ensuring that research outputs meet their original objectives and do not extend beyond the scope of their original approval.

It is hoped that research outputs arising from use of the Genes & Health bioresource will have direct relevance to a UK population of nearly 2 million British Bangladeshis and British Pakistanis, who are otherwise underrepresented in medical research.

Specific outputs from the Genes & Health multipurpose bioresource will be wide-ranging in terms of their impact. For example, the study of the impact of a rare gene variant on health and disease may have a direct impact on only a small number of individuals and their families, but the magnitude of its impact could be large if, for example, it allows the development of a new treatment for a severe disease (as exemplified by previous Genes & Health work on the HAO1 gene variant). Conversely, the use of health and genetic data to build improved disease prediction models for conditions such as type 2 diabetes might have long-term benefits on a large proportion of the population as approximately 14% of British south Asians are thought to have the condition.

Genes & Health anticipate short, and medium-long term benefits to arise and be disseminated. It is hoped that short-term benefits will include outputs from research currently underway on disease prediction using polygenic risk scores and is likely to include new insights into disease associations with rare gene variants, related to type 2 diabetes, mental health and cardiovascular disease. Medium-long-term dissemination will include similar work across an expanded range of disease areas, as well as a wider portfolio of collaborative research with partners. The current research funders will not derive exploitable benefits from the Genes & Health research outputs.

PhD students may be part of the research teams working with Genes & Health, both from within QMUL and from other academic organisations. All PhD students will have appropriate supervision, employment contracts with host research institutions, and have undergone relevant information governance training.

Outputs:

Many health research and genetic studies have focused more on other populations (for example, white European origin people), even though South Asian communities see high rates of diabetes, heart disease, rare genetic diseases,and many other conditions. Genes & Health was set up to make sure that advances in genetics and healthcare are available to this underserved community, and to help provide insights into some of the health inequalities that exist within the UK. The ability to recall participants with unique genetic makeup or combination of genetics and health status, also provides a powerful opportunity to better understand how genetics and health are linked, which may be of benefit to everyone.

The results of data processing will be new knowledge related to health and disease in people of British Bangladeshi and British Pakistani origin. The Genes & Health findings will be shared with the wider scientific community via presentations at national and international conferences and professional meetings across a range of audiences including genomics, clinical and health data communities and scientific publications. Genes & Health has already published several peer-reviewed publications, including a cohort profile (Finer et al, International Journal of Epidemiology, PMID 31504546), and empirical research demonstrating the potential of the bioresource to build new knowledge on health and disease in understudied ethnic groups and improve patient care (McGregor et al, eLife, PMID 26940866; Narasimhan et al, Science, PMID 32207686). Outputs arising from use of the Genes & Health bioresource is likely to generate a significant number of high impact publications over the next 5 years, and preprint servers and open access journals will be used in preference. Study research findings are also summarized on the Genes & Health website (www.genesandhealth.org) and via its twitter account (@eastlondongenes and @bradfordgenes, and Manchester when they are ready to recruit).

Genes & Health has an established protocol for sharing data outputs, and will continue to use this for outputs arising from NHS Digital datasets. The protocol for sharing data outputs is as follows:

Level 1: Fully open data. Genes & Health distribute summary level analysed data on the study website and via twitter. For example summary phenotype counts, e.g. numbers of volunteers with diabetes, or summary statistics of clinical observations such as blood lipid measurements. Small number suppression will be used when the numbers of volunteers is very low and there was a risk of being able to identify individuals or families from these data.

Level 2: Fully anonymised genetic data is available under a Data Access Agreement with the European Genome- phenome Archive (EGA, https://www.ebi.ac.uk/ega/home). This data will not be linked to any health data, including that obtained from NHS Digital.

Level 3: Data access and analysis within Data Safe Haven. Data security is central to ELGH (and written into the study Ethics and Governance), a data breach would irretrievably damage the study and the community trust that has been built. Applicants wishing to analyse phenotype data, e.g. NHS Digital health data, do this within an ISO27001 and NHS Information Governance compliant Data Safe Haven environment (currently UK-SERP, which provides a Virtual Desktop Infrastructure on the end user device, with export controls). Data outputs and publications will be summarised and presented in aggregate, and it will not be possible to identify individuals.

All collaborative research involving NHS Digital datasets will require a Data Sharing Agreement to ensure a rigorous approach to data safety and onward use.

The applicants will take the following approaches:

a) Academic dissemination: this has been described above and will include communication of results through peer-reviewed publication and conference presentations, as well as through other clinical and academic networks, relevant professional interest groups and social media.
b) Stakeholder dissemination: Genes & Health disseminates regular updates regarding study activities (e.g. recruitment, outputs) to a range of stakeholders, including funders, clinicians, policymakers, community representatives and volunteers. This dissemination takes place through its website (which is updated regularly) and by social media.
c) Open science: Genes & Health is committed to open science. It openly shares its methods, data dictionaries, codelists and manuscripts on open access platforms and via its website.
d) Public engagement: The study team are guided by NIHR INVOLVE in all dissemination activities with the public. Genes & Health is a community-embedded study that keeps engagement at its core, supported by its Community Advisory Group, 'Helix Champions' (community researchers) and third sector organisations, e.g. Social Action for Health. The study and QMUL support regular community engagement and health education activities sharing new knowledge amongst volunteers and their families. Genes & Health undertake innovative engagement activities with the award-winning Centre of the Cell, to deliver educational workshops and interactive web- and app- content based on the study objectives, and are supported by the QMUL Life Sciences Initiative, QMUL Centre for Public Engagement and a recent Wellcome Trust Public Engagement Award (PI D van Heel, ref 102627/B/13/A). The Helix Champions are also critical to ongoing and effective communication with volunteers to keep them engaged in the study and informed about its outcomes, and to engage with community leaders, religious organisations and schools. The research team have presented on local and national radio (Betar Bangla, BBC Asian Network and World Service) and BBC London TV. Genes & Health has an active Community Advisory Panel who support all ELGH activities through the entire research pathway, from prioritisation of topics for study, to dissemination of research findings.

The Genes & Health open access policy is described above and ensures that, where possible, there is no restrictive ownership of data or outputs in Genes & Health. It is possible that work on combined genetic and clinical phenotypes could derive significant commercial interest, e.g. to pharmaceutical companies for the development of new drugs, or genomics companies developing new disease risk algorithms. Genes & Health has a rigorous system for reviewing applications to work with its bioresource, involving its Executive team and Community Advisory Group. All collaborations and partnerships involving commercial organisations are pre-competitive and therefore not commercially exploitable.

Expected targets for outputs are, (a) short-term, e.g. publication of results within 1-2 years of receiving data for analysis, and (b) medium- to long-term, e.g. building and expanding the bioresource within 5 years, and maintaining an open access research resource to be used in global consortium-based work and replication studies (5-10 years).

Some examples of current Genes & Health supported research is summarised below to illustrate the scope of the data request applied for:

Type 2 diabetes this condition disproportionately affects British south Asians and studies are currently taking place/planned with Genes & Health to investigate the influence of common (polygenic risk scores) and rare genetic variants (changes) in type 2 diabetes, misclassification of diabetes types in British south Asians, and gestational diabetes. These studies require detailed longitudinal clinical data from HES, National Diabetes Audit, MHSDS, MSDS, cancer registration, and civil registration of deaths in order to describe the onset of diabetes (including progression from at-risk states such as gestational diabetes), acute diabetes emergencies, glucose control and uptake of diabetes care, its comorbidities (including mental health disorders and cancer), complications and outcomes. Association-based analysis will be used to quantify the relationship between genetic variation and clinical outcomes, using survival analysis to understand trends over time. Adjustment for confounding variables, such as socioeconomic status, will be included in analyses.

Familial hypercholesterolaemia (high cholesterol) this rare genetic condition is a cause of excess death from cardiovascular disease. The condition is under-diagnosed in British south Asians, and work within Genes & Health is identifying volunteers with the condition through their genetic sequence data, and correlating this to clinical data to improve diagnosis and treatment. Longitudinal data from HES and ECDS is required to capture diagnoses and relevant hospital admissions with cardiac emergencies, and long-term outcomes including death from civil registration data. The analysis of health data will be used to design translational Stage 2 recall studies based on genotype and identify volunteers who need specific clinical intervention (e.g. initiation of specific drugs).

Multimorbidity a large programme of MRC-funded work is currently underway investigating the clustering of multimorbidity in British Bangladeshis and British Pakistanis, and their trajectories across the lifecourse. This research is taking a novel, data-driven approach to identify clusters of multimorbidity across multiple single conditions. The use of multi-source, linked medical record data will increase data quality (e.g. by validating diagnoses across datasets) and the ability to generate novel and meaningful multimorbidity clusters (e.g. encompassing both physical and mental health disorders by using HES and MHSDS/MHMDS/MHLDDS data). Historic data will allow analysis of a patients risk from multimorbidity across their lifecourse, and data from HES and MSDS will be used to investigate the impact of specific lifecourse events such as pregnancy, on the development of multimorbidity.

COVID-19 Genes & Health is contributing to international efforts to identify risk of disease, and its severity, in the host genome. The availability of HES data, including diagnoses from, and episodes in, emergency care (and ECDS), admitted patient care and critical care, and civil registration of deaths will support this work across all study volunteers. These data will be used in genome-wide association studies, with likely subgroup analysis according to disease severity/hospitalisation, and adjustment for confounding variables such as age and socioeconomic status (Index of Multiple Deprivation)

Mental health and dementia work is underway to better understand the genetic influences on mental health conditions, e.g. depression and anxiety, and dementia. This work will include longitudinal analysis of risk factors for these diseases their diagnosis and severity and associated mortality and therefore requires linked multisource data, including MHSDS, HES, and civil registration of deaths. Analyses will comprise genome-wide association studies, calculation of polygenic risk scores (including assessment of their performance in predicting disease onset).

Discovery analyses of rare genetic variants (changes) one of the unique features of the Genes & Health study is the ability to investigate the impact of rare genetic variation on health and disease due to its large scale and focus on a population with high rates of parental relatedness. The impact of rare genetic variants on an individuals health requires careful study, particularly where the genetic variation is novel and its impact is unknown. A discovery-based approach is required to study such genetic variants, as they may have broad health consequences and novel disease associations. Conversely, rare genetic variants may offer protection from disease and the absence of diagnoses and hospital episodes would be highly informative. A proof-of-principal study of a rare genetic variant in the HAO1 gene has shown the importance of such studies using health data: an individual carrying a variant in this important metabolic regulator gene, had no ill effects on their health (determined by medical record data) and this knowledge provided critical information to support the development of a drug that targets this gene in a rare metabolic illness. All diagnoses and details of episodes of care are required to inform this novel genetic discovery-based research and direct subsequent translational research and drug development.

The impact of inherited genetic variants on health another unique feature of the Genes & Health study is the ability to investigate the impact of rare genetic variants arising from parental relatedness (called autozygous variants), on health. It is known that autozygosity can increase the risk of developmental disorders (the relevance of the Mental Health Minimum Dataset and its inclusion of diagnoses related to learning disability) are critical here. Additionally, there is a recent understanding that autozygosity may impact on a range of long-term conditions and reproductive health/fertility. It is particularly important to study these associations further in British south Asians due to the higher rates of parental relatedness. Discovery-based analyses are planned across a large range of traits and phenotypes to characterise these associations further, and data from HES, NDA, maternity and mental health datasets will be highly informative.

Pregnancy-based studies QMUL has active research investigating rare and severe pregnancy-based conditions with a known genetic basis (e.g. intrahepatic cholestasis of pregnancy) which will require detailed health record information to identify cases and outcome. Studies are underway within Genes & Health investigating common conditions in pregnancy (such as gestational diabetes or pre-eclampsia) their genetic basis (e.g. polygenic risk) and how this affects future risk of disease (such a progression to type 2 diabetes or cardiovascular disease). Additionally, QMUL have planned research investigating causal associations between maternal genetics offspring traits such as birth weight. All pregnancy based studies will require a core set of data from antenatal care, through the peripartum period, to immediate postnatal care to determine severity and outcome to the affected mother and her child. A limited set of data from offspring (e.g. birth weight, Apgar score, neonatal intensive care admission) has been requested to determine immediate pregnancy outcomes.

The above summary is not exhaustive and is to give an overview of the types of research currently funded and being undertaken in Genes & Health in order to justify the broad scope of data included in this application. It also reflects the need to be responsive to the ongoing development and use of Genes & Health as a bioresource that currently receives 2-4 new applications per month for new collaborative research studies. The data request has been designed to be comprehensive and support current and future research with the Genes & Health bioresource, but with the minimum of data fields required to meet its objectives.

Processing:

A list of individuals will be supplied to NHS digital, and will include patient NHS numbers, gender, dates of birth and postcodes. A unique cohort study ID will be included. No other health data will be supplied to NHS Digital. The cohort size is approx 50,000.

Participant level health data from a variety of datasets, along with the unique cohort study ID supplied by Genes & Health, will be transferred to Genes & Health. The requested data include details of recent and future use of mental health services, categorised as high risk data.

Participant-level data information from NHS Digital will be stored in a data safe haven (currently UK SeRP (part of Swansea University) along with the study ID and genetic data generated by the Genes & Health study. Approved collaborating researchers will analyse genetic and health data inside UK SeRP (part of Swansea University). Researchers do not have access to identifiable participant information, and will only be able to export summary results (i.e. without participant-level data).

All export requests are reviewed by a delegated member of the Genes & Health executive committee. Access to each data source is restricted to researchers working on relevant projects.

Researchers may be based in the UK or EEA. To date applications to use Genes and Health data have been received from the EEA, the USA, and Australia. Regardless of a researchers country, the participant level data is stored and processed inside the same Data Safe Haven located in England/Wales, and cannot be downloaded.
This agreement permits use in the EEA only.

This project does not require any subsequent flows of data.

QMUL employees (members of the core Genes & Health team) will organise and adapt NHS Digital data into formats for transfer to the UK SeRP (part of Swansea University)
Swansea University who provide UK SeRP will be data processors, responsible for storing the data, and maintaining the platform and tools that are used to analyse the data.

QMUL employees will maintain and update datasets, and consult and use the data for the purposes of analysis. Once their institution has signed a Data Access Agreement, employees of external institutions will consult and use the data for the purposes of analysis within the scope agreed with QMUL.

Following analyses, summary results (not including identifiable or participant-level data) will be exported for the purposes of publication and dissemination. QMUL employees (delegated Genes & Health team members) will review each export. Only aggregated results/outputs with small number suppression will be used in publications or dissemination.

Genotyping data (all participants), exome sequencing data (some participants) primary and secondary care health record data (all participants), and the requested NHS Digital data will be available for data linkage. Access to each data set is only granted as required, and researchers agree to limit linkage to that described in their project plan approved by the Genes & Health Executive Committee. Researchers may import additional datasets in to the data safe haven for further linkage, once approved by the Genes & Health Executive Committee.

Project applications will specify the datasets to be used and linked, including imported datasets. The executive committee will not approve projects where the risk of re-identification through linkage is high.

The access agreement signed by researchers from external institutions will prohibit linkage outside the scope of their approved project proposal. This will be monitored by review of imported datasets, review of exported summary data on each export, and review of progress and publications arising from use of Genes & Health data.

On rare occasions projects may wish to export summary results containing a small number of participants possessing a rare genetic variant. On these occasions care will be taken to export the minimum amount of data to avoid identification of participants. For example, researchers will be expected to use an age range rather than age for each participant. All small numbers will be suppressed in line with the HES Analysis guide.

Any applications to link participant data will be assessed by the Executive Committee in its formal application review process. It is not expected that Genes & Health data will be linked to identifiable publicly available datasets (no applications to do so have been received).

Researchers may use a combination of genetic and health record data to flag participants for further follow-up. Examples might include a number of individuals with a rare genetic variant of interest, or whose genetic makeup indicates they may be at a high risk of high cholesterol. Researchers, who can see pseodonymised data, will provide study IDs to the Genes & Health team, who are based at QMUL and have access to identifiable details like consent forms and contact details questionnaires, and who will contact participants. This recall of volunteers is routine, all participants of Genes & Health have agreed to be contacted, and it will allow better characterisation of volunteers health or genetics. Participants identifiable details will not be made available to researchers. There will be no attempt to link the data provided by NHS Digital directly to identifiable details.

Data processing will only be carried out by substantive employees of Swansea University, QMUL or the Institutions of approved researchers via sublicensing. All employees with access will have been appropriately trained, at a minimum with the e-Learning for Health Data Security Awareness training (or equivalent for institutions that do not have access).

The UK Secure eResearch Platform (SeRP), run by Swansea University, will be used to store data and provide an analysis platform. UK SeRP hold a Data Security and Protection Toolkit, and are compliant with ISO27001.

Data will be stored at named institutions.