NHS Digital Data Release Register - reformatted

University Of Liverpool projects

The socioeconomic patterning of alcohol misuse and mental disorder comorbidity
Assessing the Relationship Between Minority Group Status and Psychosis; Exploring Causal Mechanisms
MR1025 - The Roy Castle Lung Cancer Research Programme, Liverpool Lung Project - University of Liverpool
PREMO CNS; The PREsentation, Management and Outcomes of patients with CNS disease secondary to Breast Cancer in England (ODR2021_030)
HES Extract - Integrated Longitudinal Research Resource - Developing neighbourhood resilience, reducing health inequalities
MR1298: UK Lung Cancer Screening Trial Lung Cancer Registry and Mortality data for consented individuals
HElping Alleviate the Longer-term consequences of COVID-19 (HEAL-COVID): a national platform trial
ISARIC4C Coronavirus Clinical Information Network (COCIN) GPES record linkage
A study looking at Emergency Department attendances at NHS hospitals by people with epilepsy in the SAFE Trial
Project 10
Project 11
Project 12

604 data files in total were disseminated unsafely (information about files used safely is missing for TRE/"system access" projects).

The socioeconomic patterning of alcohol misuse and mental disorder comorbidity — DARS-NIC-220105-B3Z3S

Opt outs honoured: (Excuses: Does not include the flow of confidential data)

Legal basis: Health and Social Care Act 2012 s261(1) and s261(2)(b)(ii), Health and Social Care Act 2012 s261(2)(b)(ii)

Purposes: No (Academic)

Sensitive: Non-Sensitive

When:DSA runs 2018-11 – 2021-10

Access method: One-Off

Data-controller type: UNIVERSITY OF LIVERPOOL

Sublicensing allowed: No

AGD/predecessor discussions: igardminutes-26thnovember2020final.pdf

Datasets:

Adult Psychiatric Morbidity Survey
Adult Psychiatric Morbidity Survey (APMS)

Type of data: Anonymised - ICO Code Compliant

Expected Benefits:

University of Liverpool aim to advance the understanding of alcohol misuse and mental disorder co-morbidity. Mental disorder has been found to be highly co-morbid with alcohol misuse, however, it is not known how this co-occurs across the spectrum of mental disorders. The direction of the association between alcohol misuse and mental disorders is also not well known, however, research suggests that worsening mental health is likely to lead to an increase in alcohol consumption, whereby alcohol is used as a maladaptive coping technique. This research will analyse data that is representative of England to identify mental disorders that are more likely to co-occur with alcohol misuse.

This study also aims to identify those most vulnerable to alcohol harms. Those from lower socioeconomic status (SES) groups have been found to experience more alcohol-attributable harms and have a common mental disorder. There is currently a lack of data as to whether the co-morbidity of alcohol misuse with mental disorder is more common in lower SES groups, but it may be hypothesised that individuals of lower SES with a mental disorder may be more likely to self-medicate due to limited resources to facilitate more effective coping. This research will use data representative of England to identify groups at risk of experiencing alcohol-attributable harms and worsen symptoms of a mental disorder.

A project steering group will be established using existing links. This will be formed of representatives from NHS England, Department of Health and mental health charities in the North West of England. The group will be regularly briefed on the progress and asked to provide guidance to ensure that the approach is sensible and relevant.

Outputs:

All outputs will be aggregated with small numbers suppressed in line with the HES Analysis Guide.

An academic paper describing the socioeconomic patterning of the comorbidity of hazardous and harmful alcohol with mental disorders will be submitted to Addiction in 2019.

Dependant on the results of the research, iIt is anticipated that these findings will also presented at an academic conference focusing on the use of big data for epidemiology.

Booklets providing a summary of the research and the implications will be distributed to University of Liverpool, NHS England, Department of Health and third sector organisations through existing University of Liverpool links.

The University of Liverpool will hold a public engagement event towards the end of the project to share findings with relevant charities and NHS services, a lay booklet presenting these findings will also be distributed at this event.

At each 12 months, a report of the projects progress and findings will be compiled and passed onto the funders, the Society for the Study of Addiction, including towards the end of the project where findings from this study will form part of a PhD thesis (October 2019, October 2020, October 2021).

Processing:

The APMS dataset will be received in a pseudo-anonymised form so there will be no storage of identifiable data at any point. Data will be stored securely at the University of Liverpool, and accessed only by authorised personnel. Data will be backed up on the University of Liverpools Active Data Store which provides a centralised, secure, supported data storage facility (https://www.liverpool.ac.uk/library/research-data-management/storing-your-research-data/). No data will be passed onto other organisations. There will be no requirement nor attempt to reidentify individuals from the data.

Variables will be categorised where feasible to reduce any risk of identification. Cell sizes of less than five will not be reported in publications or presentations. After initial data processing, the original dataset received from NHS Digital will be securely archived and the reformatted dataset will be used for the main analyses.

The 2014 APMS dataset is held on behalf of NHS Digital by the UK Data Service (UKDS) (www.ukdataservice.ac.uk ) and UKDS are responsible for dissemination under direction by NHS Digital. The University of Liverpool will get the whole dataset; there is no facility to select individual variables. They will be able to download the dataset from UKDS for the period specific within the Data Sharing Agreement (DSA) and they must securely destroy the original dataset when the DSA expires and notify DARS in line with standard procedures. The updated dataset will be kept securely for 10 years in accordance with the University of Liverpool's Research Data Management Policy.

This 2014 version of the dataset available via DARS has been redacted on Disclosure Control Procedure advice to minimise the likelihood of individuals being able to identify anyone taking part in the survey.

All organisations party to this agreement must comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by Personnel (as defined within the Data Sharing Framework Contract ie: employees, agents and contractors of the Data Recipient who may have access to that data).

No data will be shared with third parties.

The data will only be used for the purposes described in this agreement and will not be linked to any other dataset.

Assessing the Relationship Between Minority Group Status and Psychosis; Exploring Causal Mechanisms — DARS-NIC-175693-L5Y4K

Opt outs honoured: (Excuses: Does not include the flow of confidential data)

Legal basis: Health and Social Care Act 2012 s261(1) and s261(2)(b)(ii), Health and Social Care Act 2012 s261(2)(b)(ii)

Purposes: No (Academic)

Sensitive: Non-Sensitive

When:DSA runs 2019-11 – 2022-10

Access method: One-Off

Data-controller type: UNIVERSITY OF LIVERPOOL

Sublicensing allowed: No

Datasets:

Adult Psychiatric Morbidity Survey
Adult Psychiatric Morbidity Survey (APMS)

Type of data: Anonymised - ICO Code Compliant

Objectives:

The University of Liverpool is requesting the Adult Psychiatric Morbidity Survey (APMS) dataset for use in studies as part of a PhD research project being carried out by the applicant, aiming to answer the question: How does the experience of social adversity (including discrimination, bullying, reduced social support and trauma) explain the relationship between minority group status and psychosis, including specific psychotic symptoms such as paranoia?

The primary aim is to assess two different minority groups.

The first of these groups is ethnic minorities. The study team is aware that analyses of prevalence of psychotic disorders within such groups is already carried out in the APMS report. However the analyses do not look any deeper into factors including social adversities which may potentially elucidate explanatory mechanisms (Morgan, Charalambides, Hutchinson, & Murray, 2010), which will be the aim of the current project. There is also evidence that the experience of discrimination in ethnic minorities may be related persecutory delusions (Janssen et al., 2003), as well as symptoms of paranoia generally (Qi et al., 2019), and thus it will be important to look at causal pathways for such specific symptoms.

The study team also aim to investigate sexual minority groups. Evidence suggests non-heterosexual groups have elevated rates of psychosis (Bolton & Sareen, 2011), and related symptoms which may be explained by social adversity (Gevonden et al., 2014). A past study by the applicant using the APMS2007 data found that sexual minority status was related to paranoia symptoms specifically through social adversity pathways (Qi et al., 2019). The aim will be to replicate the findings using the 2014 sample.

Justification for the processing of the data is under Article 6(1)(e) of the GDPR, for the performance of a task in the public interest; and Article 9(2)(j) of the GDPR, where processing of the data is necessary for archiving purposes in the public interest. In order to inform the treatment of severe mental health disorders in minority groups, and to develop an evidence-based approach to public mental health, it is necessary to identify how a number of different social factors are associated with such disorders. Unfortunately, research into such factors is currently very limited.

It is hoped that the research using the APMS data will inform the understanding and treatment of severe mental health in already disadvantaged groups. Thus, processing of the data also meets conditions for Part 2 of Schedule 1 of the Data Protection Act 2018, under Health and Social Care purposes. These purposes include (c) medical diagnosis, through improving understanding of aetiology and symptoms of psychosis, and also (d) the provision of healthcare or treatment, and (e) the provision of social care.

A number of aspects of the APMS data would allow the identified aims to be achieved. Psychosis and related symptoms are rare in the general population, especially within minority groups. A large nationally representative survey allows for adequate statistical power to analyse such groups, as well as understand the phenomena within an accurate UK context. The data set also contains the relevant variables relating to group identity, specific symptoms, and experiences of social adversities. Different analyses will be utilised. Logistic regression models, as well as mediation models appropriate for binary outcome data will primarily be used. New techniques may also be incorporated as they emerge.

The PhD project that the application is part of has been underway since September 2017 and will run until at least September 2021. The APMS data will be used for the identified studies, which will be part of the PhD thesis. The data requested will be pseudonymised and represents the least intrusive manner of achieving the aims of the application. Pseudonymised data mitigates the risk of re-identification. There is no requirement or intention to reidentify individuals.

The high level aggregate survey results were published as National Statistics in September 2016, but due to the sensitive nature of the underlying data, they have been assessed by the Disclosure Control Panel (DCP), resulting in significant reductions in the disclosure risk by removing key geographic variables and other household identifiable variables

The University of Liverpool is the sole data controller, with sole autonomy for determining the purpose and manner for processing the APMS data. Data will only be processed by the applicant and their supervisory team who are substantive personnel of the University of Liverpool. No other persons or organisation will process the data.

Outputs:

Following processing of the data, a number of expected outputs will be produced including: submissions to peer reviewed journals in psychology and psychiatry (i.e. Social Psychiatry and Epidemiology); presentation of the work in relevant university groups; and submitted for presentation at relevant international conferences. Information in multiple forms appropriate for the public will also be produced.

All outputs will only contain data that is aggregated in line with NHS Digital guidelines, and will contain no individual level data.

A number of dissemination channels will be utilised. Journal submissions and presentations will likely primarily reach the scientific research community. Beyond this, attention will be paid to the use of language so that findings are disseminated in an understandable form for wider audiences. Social media channels will also be used to reach both academic as well as the general public. Engagement will also be sought using both new and established relationships, particularly with organisations focusing on LGBT and ethnic minority mental health, in order to reach relevant stakeholders. Engagement with policy makers will also be attempted more directly at relevant events and meetings. The aim will be to both inform research, public understanding, policy and practice, with the intention of improving health and social care outcomes for the relevant groups.

Communication channels within the research group and department will also be used, which include methods such as an online website, internal newsletters and online blogs. The University of Liverpool communications team will also be consulted in terms of potential wider communication strategy.

The research will result in the development of new tools or technologies.

All outputs are expected to be produced by the completion of the PhD project (September 2021), though journal submissions will be attempted by September 2020.

In order to protect patient confidentiality in publications resulting from analysis of APMS data users must:
guarantee that any outputs made available to anyone other than those with whom this agreement is made, will meet required standards, including the guarantee, methods and standards contained in the Code of Practice for Official Statistics (http://www.statisticsauthority.gov.uk/assessment/code-of-practice/index.html) and the ONS Statistical Disclosure Control (https://gss.civilservice.gov.uk/statistics/methodology-2/statistical-disclosure-control/) for tables produced from surveys;
apply methods and standards specified in the Microdata Handling and Security Guide to Good Practice (http://www.data-archive.ac.uk/media/132701/UKDA171-SS-MicrodataHandling.pdf) for disclosure control for statistical outputs.

Processing:

The 2014 APMS data set is held on behalf of NHS Digital by the UK Data Service (UKDS) (www.ukdataservice.ac.uk) and UKDS are responsible for dissemination under direction by NHS Digital. The University of Liverpool will receive the pseudonymised APMS data set. There is no facility to select individual variables. The University of Liverpool will be able to download the data set from UKDS for the period specified within the Data Sharing Agreement and must securely destroy all local copies of the data set when the Agreement expires and notify NHS Digital in line with standard procedures. This 2014 version of the data set available has been redacted on Disclosure Control Procedure advice to minimise the likelihood of individuals being able to identify anyone taking part in the survey.

UKDS will transfer the pseudonymised APMS data to the University of Liverpool. No other organisations will be involved in the flow of data.

There will be no flow of data into NHS Digital identifying special categories of personal data such as health data, and no such flow of data out of NHS Digital.

Processing by the University of Liverpool will involve the analysis of the data using standard statistical packages such as STATA in order to achieve the stated purposes. Only aggregate data will be used in any resulting research reports. There will be no linkage of data, and data will not be matched to publicly available data. There will also be no requirement nor attempt to re-identify individuals from the data.

Processing of data will be carried out only by employees of the University of Liverpool who have been trained in data protection and confidentiality, in line with GDPR. NHS Digital restricts access to the datasets to further mitigate the risk so that it is remote. The APMS survey data will therefore only be available to a limited number of individuals for specific and limited purposes described in this Agreement.

Data will only be accessed in a secure environment, on systems within the premises of the University of Liverpool. Data will only be stored on password protected computers on University premises, which are only physically accessible to employees of the University. Only the authorised personnel will have access to the data.
All organisations party to this agreement must comply with the Data Sharing Framework requirements, including those regarding the use (and purposes of that use) by Personnel (as defined within the Data Sharing Framework Contract i.e. employee, agents and contractors of the Data Recipient who may have access to that data).
In order to protect patient confidentiality in any publications resulting from analysis of APMS data, researchers will:
i. Guarantee that any outputs made available to anyone other than those with whom this agreement is made, will meet required standards, including the guarantee, methods and standard contained in the Code of Practice for Official Statistics and the ONS Statistical Disclosure Control for tables produced from surveys.
ii. Apply methods and standards specified in the Microdata Handling and Security Guide to Good Practice for disclosure control for statistical outputs.

MR1025 - The Roy Castle Lung Cancer Research Programme, Liverpool Lung Project - University of Liverpool — DARS-NIC-147982-J7KGV

Opt outs honoured: N, Yes - patient objections upheld, Yes (Excuses: Mixture of confidential data flow(s) with consent and flow(s) with support under section 251 NHS Act 2006)

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC, Other - For one subset of the cohort the data is disseminated under Health and Social Care Act 2012 - s261(7) and National Health Service Act 2006 - s251 - 'Control of patient information'. For the remainder of the cohort the data is disseminated under Health and Social Care Act 2012 - s261(2)(c)., Other-For one subset of the cohort the data is disseminated under Health and Social Care Act 2012 - s261(7) and National Health Service Act 2006 - s251 - 'Control of patient information'. For the remainder of the cohort the data is disseminated under Health and Social Care Act 2012 - s261(2)(c)., Health and Social Care Act 2012 s261(2)(c); Health and Social Care Act 2012 s261(7); National Health Service Act 2006 - s251 - 'Control of patient information'., Health and Social Care Act 2012 - s261(5)(d); Health and Social Care Act 2012 s261(2)(c); National Health Service Act 2006 - s251 - 'Control of patient information'.

Purposes: No, Yes (Academic)

Sensitive: Non Sensitive, and Sensitive, and Non-Sensitive

When:DSA runs 2019-03 – 2020-02 2016.04 — 2025.05.

Access method: Ongoing, One-Off

Data-controller type: UNIVERSITY OF LIVERPOOL

Sublicensing allowed: No

AGD/predecessor discussions: AGD minutes - 10 April 2025 - FINAL.pdf, AGD minutes - 19th September 2024 final.pdf, AGD minutes - 23 February 2023 final.pdf, igard-minutes---23-july-2020-final.pdf, igard-minutes-21-june-2018.pdf, igard-minutes-12-april-2018.pdf, IGARD_Minutes_20.07.17.pdf

Datasets:

MRIS - Flagging Current Status Report
MRIS - Cause of Death Report
MRIS - Cohort Event Notification Report
Hospital Episode Statistics Admitted Patient Care
Hospital Episode Statistics Outpatients
Hospital Episode Statistics Accident and Emergency
Cancer Registration Data
Civil Registration - Deaths
Demographics
Hospital Episode Statistics Accident and Emergency (HES A and E)
Hospital Episode Statistics Admitted Patient Care (HES APC)
Hospital Episode Statistics Outpatients (HES OP)
Civil Registrations of Death
NDRS Cancer registration (pre-1995)
NDRS Cancer Registrations
NDRS Somatic Molecular Dataset

Type of data: Identifiable, Anonymised - ICO Code Compliant

Objectives:

Over the years, the Roy Castle Lung Cancer Research Programme (RCLCRP) has been at the forefront of ground breaking research in early detection of lung cancer.

The RCLCRP’s Liverpool Lung Project (LLP) is a large, prospective, population-based study of lung cancer, together with a case-control cohort collecting data and samples from diagnostic and surgical patients.

The University of Liverpool requires follow-up data for the Liverpool Lung Project, a longitudinal observational study aimed at identification of risk factors and biomarkers that will allow improved early detection and treatment of lung cancer and respiratory disease. Hence it is a task in the public interest, leading to improved understanding of disease and opportunities for improved treatment resulting in lives saved and improved health.

The University of Liverpool is the sole Data Controller and also processes the data for this study. No other organisations process the data for this purpose. The study protocol names core collaborators from the Wolfson Institute of Preventive Medicine, Liverpool Heart & Chest Hospital, Clatterbridge Cancer Centre and Aintree University Hospitals NHS Foundation Trust. These individuals do not have any bearing on the data under this Agreement. Their roles are limited to providing data and information to the LLP. If the data collected by the University of Liverpool indicates something of note, the University of Liverpool will request the appropriate clinical partner to undertake a case review. A collaborator from the Wolfson Institute of Preventive Medicine has additionally acting in an advisory role on statistical modelling but has not required access to the data under this Agreement to do this.

The LLP, which is based at the University of Liverpool, includes one of the largest collections of data and samples in the UK and Europe from a population cohort at high risk of lung cancer (approximately 9000 individuals from the Liverpool area, recruited whilst healthy 1996 - 2014). By determining which of these individuals get lung cancer, the LLP can perform scientific studies to integrate risk factors into its ongoing risk prediction modelling research, building on the LLP Lung Cancer Risk Model.

The LLP also includes patients referred for “query lung cancer” investigations and continues to recruit these patients at different times in their diagnostic pathway. In addition, the LLP recruits patients undergoing treatment for lung cancer, predominantly surgical patients. To date the LLP has recruited over 4000 of these diagnostic and surgical patients into its “hospital cohort”, the majority at the Liverpool Heart & Chest Hospital (LHCH).

Samples and data from these patients, together with controls from the population cohort, are primarily used in scientific studies to discover new genetic risk factors and potential biomarkers – i.e. substances that can be measured in samples of blood, sputum or tissue that can tell something useful about the disease.
This may be, for example, differences or changes in a patient’s DNA (mutation or methylation) when detected in biopsy samples or blood, which can potentially indicate the person’s risk of lung cancer, can confirm the presence of lung cancer or indicate what type of lung cancer a person has (which may lead to personalised treatment).

The aims of the LLP are to:

i. determine factors associated with the risk of lung cancer (to help identify ways to select people for screening);
ii. identify better ways to detect lung cancer earlier (to improve diagnosis, which will save lives);
iii. to better understand the biology of lung cancer (leading to potential new therapies);
iv. identify ways of selecting patients for the best current and future treatment;
v. contribute to improving patients' outcome and ultimately save lives.

To this end, a documentation of previous history of diseases is essential for exploring the impact of comorbidity on lung cancer. A rich source of data for exploring the potential role of comorbidity in lung cancer pathogenesis is the Hospital Episode Statistics (HES).

The University of Liverpool requires NHS Digital data to inform the current status (alive/dead) and dates and causes of death; this data specifically informs the prime question of the study, helping to identify both lung cancer incidence (within causes of death) and the time from recruitment or treatment until death (all-cause mortality, an important measure of health status and outcome). This data needs to be provided for specific individuals (by linkage through NHS number with verification by month/year of birth) to allow appropriate data for personal level risk model development and evaluation. As a longitudinal study, it is important to follow as many subjects as possible until death, so the LLP plan to ask for data on all available years (including retention of data already provided and incremental updates). Although the study was based in Liverpool, recruited individuals at local hospitals included those who had travelled from out of area for specialist care and individuals may have moved subsequent to recruitment, therefore all UK data is requested. Alternative methods of assessing mortality status (e.g. re-contacting individuals) would be either more intrusive (likely to cause harm/distress), less informative or incomplete.

NHS Digital will also provide cancer registry data, identifying all cancer diagnosis, with dates (where available). The justification for this is broadly the same as for cause of death, but this data provides additional information on timing and specific diagnosis of lung cancers (a primary outcome measure of the study) and other cancers (which may be significant confounding factors that must be taken into account during analysis). Again, this must be at the individual level, for all dates available and all geographical areas to provide the best dataset of personalised prognostic prediction. Alternative methods (e.g. case-note review) are likely more intrusive, less efficient and incomplete.

Health Episode Statistics (HES) for inpatient and outpatient care data is required to provide more detail of cancer diagnostic pathways and treatment, which have a bearing on outcome measures and informs research on potential changes to care pathways influenced by the risk models being researched. However, the main reason for accessing this data is that information is provided on co-morbidities (illness arising alongside lung cancer) and predisposing illness (e.g. respiratory diseases that has shown contribute to lung cancer risk); this data will help inform both risk model and allow the LLP to correct outcome measures for confounding factors. The same considerations on individualised data, data timing, geographical spread and alternative methods apply.

Data requirements have been limited to those health-related events that directly impact the aims of the study and address the public interest justification. The LLP has collected patient identifying information used for linkage directly from subjects; for quality assurance purposes, they ask that NHS number, name, DoB, gender and postcode are returned by NHS Digital (to ensure unequivocal matching). Data returned is subsequently linked only via the Member Number (LLP Master Patient Index), as a pseudonymisation key.

The LLP previously obtained the HES data (inpatient, outpatient and emergency care) of consented participants from NHS Digital and linked this to the epidemiology data already gathered from the participant through detailed questionnaires obtained upon recruitment into the study and from their medical notes.

The LLP are no longer requesting HES data relating to emergency care, as all subsequent diagnosis and treatment data will be within the inpatient and outpatient care.

In addition, all information gathered is linked to the mortality data obtained from NHS Digital to study the mortality patterns of participants in the LLP.

Data obtained from Cancer Registries via NHS Digital is also linked to such participants in order to record date of cancer incidence and type of cancer.

The study investigators are not aware of any moral or ethical issues raised by the proposed dissemination of NHS Digital data to the University of Liverpool, nor the dissemination of the aggregated study results to the wider public. The study investigators are unaware of any risk of potential harm to the public by the dissemination. The potential for harm has been assessed by University of Liverpool (Data Protection and Study Sponsor), by NRES ethical approval and by the Confidentiality Advisory Group (CAG).

Safeguards are in place that protect the interests of the data subject, these have been judged as proportional to the substantial public interest, including approval by CAG under section 251 of the NHS Act 2006 (which together with informed consent also safeguards the common law duty of confidentiality). Safeguards include physical and operational barriers protecting privacy and allowing processing of personal identifying and sensitive data only by staff contractually obliged to fulfil the requirements set out by the University of Liverpool Data Protection Policy, compliant with all laws and approvals.

Processing activity and storage of personal and sensitive (special category) data is limited according to best practice (both in terms of scope and timing), but recognising that longitudinal studies require long-term data linkage and that research studies require anonymised and pseudonymised data to be available for a significant amount of time. These issues have been explained to participants during informed consent and are covered in a publicly accessible Privacy Notice.

The research project is part of a wider international effort to address the critical issue of earlier detection of lung cancer, required to address the single largest cause of cancer-related mortality. As such, the University of Liverpool share processed data with other research groups, but do so in a controlled way, bound by Data Sharing Agreements that confer the same level of adherence to protecting the rights of individuals involved in research and specifically covered by informed consent of subjects. All such data shared is either aggregate (as in publication), anonymous, or pseudonymised (such that additional data provided by the collaboration can be added to the sum of knowledge about subjects within the LLP, but not allowing others to link data and risk re-identification – this includes measures to suppress rare events).

The work undertaken by the LLP is funded by multiple organisations (research charities and government agencies, including international bodies). Their role is to provide funding for the research to the University of Liverpool (and collaborating research centres); they also support and encourage adherence to the highest standards of governance, public dissemination of results and scientific rigour.

Yielded Benefits:

The Liverpool Lung Project has utilised data from NHS Digital to develop and validate the LLP risk model. This has been used un the UKLS lung cancer screening trial to identify a cohort of high risk individuals and half of these underwent low dose CT screening, identifying a number of lung cancer (notably the majority of which were early stage an underwent potentially curative surgery). Similarly the risk model is being used in the Liverpool Healthy Lung Project to identify potential cancer patients in a community setting and has been adopted by similar trials and implementation projects. Other yielded benefits include a wide range of publications that have contributed to improved understanding of lung cancer and identified potential biomarkers. These outputs and benefits clearly address the primary aim and support the legitimate interest in utility of NHS Digital data.

Expected Benefits:

Lung cancer is the leading cause of cancer-related death in most developed countries, with mortality rates exceeding that of colon, breast and prostate cancer combined. The benefits of this research are linked clearly to it's aims, namely to find ways to screen for and diagnose cancer earlier to improve the outcomes for patients.

Lung cancer is predominantly a disease of the elderly, with an average age at diagnosis of around 60-70 years, and often presented very late at an advanced stage.

Given that more than 94% of the patients diagnosed with lung cancer in the UK die of the disease within five years, the primary objective is to detect lung cancer at an earlier, potentially more curable stage (5-year survival rate of stage IA tumour is ~70%).

The primary aim of this research is to improve the early detection of lung cancer. Changes likely to have happened as a result of the LLP outputs include: introduction of risk models to identify those most at risk of lung cancer for early detection initiatives (such as screening); identification of biomarkers that aid detection or clinical management of lung cancer; identification of novel targets for cancer treatment.

The magnitude of the impact for improved early detection is considered to be great, as lung cancer is the biggest cause of cancer mortality and this is largely because of detection at late stage when treatments are less effective.

Screening by low dose CT has already been proved to be of benefit and has been adopted in the USA; a UK randomised control trial that utilised the LLP risk score that has come directly from their use of NHS health data. One of
the benefits of using a risk-based approach is in cost-saving by improved targeting of screening to those most likely to benefit. Further analysis of trials is ongoing to provide evidence for mortality improvement, but regional initiatives are already adopting the LLP risk score and delivering early detection opportunities to thousands of individuals.

Biomarkers that allow early detection (e.g. from blood samples) or improve screening results (by stratifying risk for those with small potentially pre-cancerous lesions) are also likely to provide a large benefit in terms of numbers of individuals benefiting and efficiency savings (e.g. reducing the need for follow-up scans). These benefits are likely to take longer to achieve as this work provides only discovery and preliminary validation, clinical trials will need to be performed before implementation.

Similarly insights into lung cancer biology (e.g. identifying molecular signatures of cancers that have poor outcomes) will have a long lead time to improved treatment. However the potential benefits are great, given the incidence of lung cancer and the relatively low effectiveness of current treatments.

The primary beneficiaries of the research will be the general population, through improved health and lower mortality. The NHS will benefit from lower costs or improved efficiency. Research funders are primarily non-profit, but may benefit from royalties paid if any intellectual property is developed and exploited. Those funders that are commercial enterprises aim to benefit by generation or exploitation of intellectual property (e.g. biomarkers and drugs); although it should be noted that the University of Liverpool and other funders have protected intellectual property in order to share these benefits. Given that the LLP plan to publish their research findings it is possible that third parties will benefit, although to do so they will have to extend the work.

This research will be of direct benefit for patients in developing those early detection technologies and techniques.

Although the pathogenesis of lung cancer is not yet fully understood, researchers have suggested the potential role of the occurrence of concomitant diseases (conditions that occur at the same time) in the aetiology (cause/s) of lung cancer.

Due to increasing longevity and rapid ageing populations, the number of people with more than one comorbid conditions is expected to increase sharply in the coming decades. This increase might lead to an increase in the incidence of lung cancer and the comorbidity burden might lead to increase overall and/or lung cancer-specific mortality.

This research will allow for these co-morbidity links and contributory factors to be investigated, analysed and reported on and ultimately benefit the patient.

The establishment of the Liverpool Lung Project (LLP) cohort has provided an important resource that is internationally recognised and will continue to provide benefits in the future. The ongoing update of associated NHS Digital data will further enhance the utility of this research resource, for example:

1. The University of Liverpool Roy Castle Lung Cancer Research Programme aim to utilise bronchial washings and/or sputum and/or blood to develop molecular assays for early diagnosis of lung cancer. The integration of HES, Cancer Registry and mortality data with the molecular data will allow the researchers to improve the LLP Risk Model tool for application of personalised risk assessment alongside the development of future molecular assays (either targeting those at highest risk, or attenuating the results to account for known confounding factors). The earlier lung cancer can be diagnosed the better the outcome for the patient.

2. Characterisation of risk factors for lung cancer is of considerable health and economic importance, as they can be used to inform prevention, screening and treatment policy. The researchers will continue to develop the LLP risk model for lung cancer and is identifying epigenetic and genetic biomarkers for early detection and prognosis of lung cancer in order to improve outcomes for patients.

Outputs:

All outputs are aggregate with small number suppressed in line with the HES analysis guide.

Outputs will include;

1. Reports:

Reports for grant awarding bodies will be produced. This will include reports in support of additional funding applications for further analysis, ensuring maximum utility and benefit from the data provided.

Annual Report(s) for funding body, e.g. The Roy Castle Lung Cancer Foundation (RCLCF) to identify type of research undertaken, recruitment statistics and specific research developments within the funding period. This report is seen by the Roy Castle Lung Cancer Foundation Executive and Scientific Committee and its Trustees to inform policy and quantify the benefit of future funding of the research programme.

2. Publications:

It is anticipated that the analysis from this study will be included in internationally renowned oncology, epidemiology and public health journals. Publications will be prepared for 2017, 2018, 2019 and 2020. Journals for consideration will include;
Thorax
Journal of Thoracic Oncology
Lung Cancer
British Journal of Cancer
Cancer epidemiology, biomarkers & prevention
Scientific Reports
Oncology Letters
Nature Genetics
Nature Communications
Lancet
Lancet Oncology

More information on past publications can be found on the study website - http://www.liverpoollungproject.co.uk/publications

3. Presentations:

In accordance with previous years it is expected that presentations will be given at major cancer conferences. These presentations will provide dissemination of results from ongoing studies of LLP Risk Modelling, Methylation, MicroRNA, Sequencing, etc.

Conferences are expected to include:
World Conference on Lung Cancer
American Association for Cancer Research Annual Meeting
National Cancer Research Institute Annual Meeting

Processing:

Since 2007, the LLP has periodically provided identifying details of consented participants to NHS Digital or predecessor service providers (ONS, the NHS Information Centre and the Health & Social Care Information Centre). LLP has been provided with the linked data about participants’ deaths (date and cause), cancer registrations, exits from or re-entries to the NHS and, subsequently, Hospital Episode Statistics.

The flow of data into NHS Digital is limited to personal identifying information required for linkage (NHS number) and for verification of that linkage: unique cohort MPI No. (Master Patient Index), Name, Date Of Birth, Gender, NHS Number and Postcode.

This data subjects all gave informed consent to take part in the study, including for access to their health records. Where consent taken in the past was deemed not to meet current standards, the LLP have obtained support under section 251 of the NHS Act 2006 from CAG to permit the processing of confidential data without fully informed consent.

The flow of data out of NHS Digital consists of a download of data files containing the data initially supplied, plus matching fields from NHS Digital and the requested health care and outcome data, the majority of which can be considered as special category data: specifically race/ethnic origin and health data. As previously, identifying data other than the minimum required for basic data linkage, are provided to ensure quality assurance of matching (i.e. latest name/gender/DoB/postcode).

The data files are transferred to an encrypted drive on a dedicated University of Liverpool virtual server. This is only accessible to a limited number of qualified staff and is password protected within a managed environment.

Data supplied by NHS Digital is processed for inclusion into the LLP clinical database, to enable its use for the approved legitimate purpose. The data is linked at patient level with data in the LLP clinical database using the subject specific MPI number only. The data within this database are pseudonymised as much as practically possible but include health data and event dates. Event dates are required for calculation of time periods in relation to other events both within the data provided by NHS Digital and to events collected by other means, e.g. directly from the subject or by review of hospital records.

Data processing is performed within a University of Liverpool managed environment (password protected) and limited to specific folders accessible only to qualified staff associated with the LLP study. In addition the clinical database on which the data resides is separately managed with additional controls on staff access.

Data flow outside of the clinical database (but remaining within the managed network) may include both personal identifying information and special category data (ethnicity and health data). This will only happen as required for data cleaning or collation of data from other sources (e.g. confidential provision of patient lists for case-note review by study-associated clinical staff with the approval of local Caldicott Guardians – in this case NHS Digital data is used to identify subjects, but the data itself is not shared).

The data will only be accessed by researchers employed by the University of Liverpool and not shared with any third parties.

Subsets of data from the clinical database including information derived from the data under this Agreement may be extracted and shared with collaborating organisations. Any data that is shared with these collaborative organisations will be pseudonymised with a unique study ID that allows the LLP to link results of analysis back to individuals. Dates are removed (replaced by ages or time periods, e.g. time form diagnosis to death) or limited (e.g. year or month + year); health events are curated (e.g. classed as lifetime events rather than time dependent events) or recoded to remove granularity (e.g. grouping into less specific terms such as “lung disease”). The derived data will be shared in combination with data from other sources (e.g. pathology records, electronic patient records, questionnaires) but this data will not contain identifying information or any information which would result in any shared data being identifiable as originating or deriving from the data from NHS Digital or possible to reverse-engineer such that it can be so identified. The combined data will conform to the specification shared with NHS Digital (filename ‘UoL Lung projects Data sharing MD209180218’) which NHS Digital has approved.

Given the high incidence of lung cancer and associated co-morbidities, it is considered incredibly unlikely that any re-identification could occur, even with access to other data.

Where possible data is provided in aggregate form, although the nature of the research performed often requires individual level data (to link biomarkers or attributes to specific health outcomes). In most cases the data is combined with new data produced from biological or biochemical assays (in associated samples provided from the LLP biobank) or from algorithms based on risk data provided by subjects via questionnaire. Having combined the data an assessment is made as to whether the new data allows the LLP to predict specific health outcomes (e.g. diagnosis, specific disease sub-type related, disease severity or outcome) – providing a risk score or diagnostic algorithm that may help the LLP guide future treatment. Additionally correlations are made between data that help the LLP understand the biology of lung cancer, which provides new opportunities for alternative treatments.

Any data shared must be subject to the conditions that the collaborating organisation:
i. must not combine it with other datasets which could potentially increase the risk of reidentification for individuals in the dataset;
ii. must not attempt to re-identify individuals in the dataset;
iii. must not onwardly share the dataset;
iv. must use the dataset for a defined purpose in support of the LLP’s aims defined within this Agreement, and
v. must not publish the data.

Under the terms of this Agreement, the University of Liverpool is responsible for ensuring compliance with the above conditions and for confirming destruction of the data by any collaborating organisation once the data is no longer required for the purpose for which it was shared.

Organisations receiving data will all be involved in health research that furthers the aims for use of the data. This includes non-profit and educational establishments and commercial research organisations (e.g. biotech and pharmaceutical companies). These organisations may be in the UK or oversees. In the case of international sites and commercial organisations, data will only be shared for subjects who have explicitly consented to this as part of study recruitment informed consent procedures. Data Transfer Agreements (or Material Transfer Agreements if samples are also included) cover all such transfers of data and confer the same standards of care as specified for receipt of data from NHS Digital, meeting all obligations of the relevant data protection legislation.

The ultimate flow of data is into publications made publicly available for the benefit of the wider research community. Special care is taken to ensure confidentiality is maintained and re-identification is not possible; data is in aggregate form in publications.

The risk of reidentification via data linkage is relevant for any data subsequently shared but is mitigated in a number of ways. Aggregation is widely used for sharing results, in which case individual level data is not available for linkage. Where aggregation is not possible, studies are of a substantial size for a common disease, so individuals cannot easily be identified unambiguously even based on multiple parameters, e.g. disease status, gender and age. Dates are supressed (either truncated or converted to time periods /age). Minimal relevant summary data, e.g. disease type, appropriate to the research question addressed is provided, rather than full case histories. Geographical data (e.g. postcode, collected form the subject, not provided by NHS Digital) is only used to contact individuals or to gather other data (e.g. deprivation index, radon exposure) – in which case data for postcode alone are shared with no other identifying information; the subsequent data is linked to individuals within the secure LLP clinical database and only results (not location) shared subsequently.

The data from NHS Digital is linked with other information collected from or about the participants including questionnaire responses and information derived from samples of blood, sputum or tissue. The data are then pseudonymised for analysis.

National patient opt-outs will be applied to all data released by NHS Digital under this Agreement. The opt-out policy does not require opt-outs to be applied for individuals who gave sufficiently informed consent for their data to be processed but the University of Liverpool has chosen for opt-outs to be applied for the whole cohort having given due consideration to the following factors:
i. The cohort comprises a mixture of participants recruited prior to 2003 whose consent in relation to the specific data processing described in this Agreement was not sufficiently informed and for whom support under the NHS Act 2006 section 251 allows the processing of their data without consent, and participants who gave sufficiently informed consent since 2003. The majority of participants were recruited prior to 2003.
ii. The participants are based in geographical areas with comparatively lower uptakes of patient opt-outs.
iii. The LLP actively seek to re-consent existing participants using the latest versions of consent materials whenever there is a suitable opportunity and this would create practical challenges in managing which participants are deemed to have given sufficiently informed consent (and therefore are exempt from opt outs for the purpose of the LLP) and which have not. The overhead of managing which participants belong to which group was considered to outweigh the risk of losing consented participants who have registered an opt-out.
iv. Consent is permissive but does not oblige the LLP to use participants' data in the programme.

The data is received and stored at the University of Liverpool. All data processing of the data supplied by NHS Digital takes place at The University of Liverpool and is carried out by the University of Liverpool Roy Castle Lung Cancer Research Programme (Liverpool Lung Project) staff, holding substantive contracts of employment at the University of Liverpool and having received appropriate Data Protection and Good Clinical Practice (GCP) training.

Staff are based within The William Henry Duncan Building with data held on secure servers within the University of Liverpool (in compliance with the IG toolkit).

NHS Digital data, prior to processing and transfer to the LLP clinical database, is only accessed on University of Liverpool premises across a secure data network (password protected) from a BitLocker) encrypted virtual server (by approved LLP study staff); the secure server is firewall protected and accessible only from 3 designated PCs (with unique IP addresses).

After processing, data is stored on a System Builder database located on a secure data network at the University of Liverpool; this network is only accessible to staff and is password protected. Data is located in folders that are limited to named staff. The System Builder database has additional access control including different usernames and passwords.
All hardware is in secure environments: servers within the Computer Services Department and PCs within research building with swipe-access control. Procedural control ensures that PCs are never left accessible. All data is stored on servers (rather than individual PCs) and is backed-up routinely with the same level of protection (i.e. the encrypted server has an encrypted back-up).
Only data that is anonymised or pseudonymised is shared by e-mail or by portable media, in which case all files are password protected and/or encrypted during transit.

The data are used for the overarching long-term objective of building a greater understanding of how to identify individuals at high risk of lung cancer. The data are added to the LLP’s risk model. The exact uses of the data evolve over time as science moves forward. For example, as new scientific technical emerge, the data are used in different ways but always within the scope of the overarching objective. The data will only be processed for the purposes described in this document.

Data received from NHS Digital will not be shared with any third parties.

PREMO CNS; The PREsentation, Management and Outcomes of patients with CNS disease secondary to Breast Cancer in England (ODR2021_030) — DARS-NIC-687052-V2Q0M

Opt outs honoured: No (Excuses: Does not include the flow of confidential data)

Legal basis: Health and Social Care Act 2012 s261(2)(a)

Purposes: No (Academic)

Sensitive: Non-Sensitive

When:DSA runs 2024-06 – 2026-06 2025.04 — 2025.04.

Access method: One-Off

Data-controller type: UNIVERSITY OF LIVERPOOL

Sublicensing allowed: No

Datasets:

NDRS Cancer Registrations
NDRS Linked DIDs
NDRS Linked HES AE
NDRS Linked HES APC
NDRS Linked HES Outpatient
NDRS National Radiotherapy Dataset (RTDS)
NDRS Systemic Anti-Cancer Therapy Dataset (SACT)
NDRS Cancer Pathway

Type of data: Anonymised - ICO Code Compliant

Objectives:

University of Liverpool (UoL) requires access to NHS England data for the purpose of the following project:
PREMO CNS: PREsentation, Management, and Outcomes of patients with CNS (Central Nervous System) disease secondary to breast cancer in England.

The following is a summary of the aims of the audit provided by UoL:
Primary objective: To audit the overall survival from initial diagnosis of CNS involvement secondary to breast cancer in English centres.
Secondary objectives
- To audit the number of cases of metastatic breast cancer (MBC) involving the CNS presenting in English centres per year.
- To audit the current practice in English centres regarding the diagnosis and management of CNS disease secondary to breast cancer in relation to national and international guidelines.
- To audit the outcomes of patients treated for CNS involvement secondary to breast cancer in English centres

The following NHS England Data will be accessed:
National Disease registration Service (NDRS) Cancer Registrations necessary to outline relevant patient, cancer, geographical characteristics prior to in-depth statistical analysis, to obtain a breakdown with respect to sex, ethnicity, age at diagnosis, stage of cancer at diagnosis, receptor status, comorbidity, and survival/outcome.
NDRS Systemic Anti-Cancer Therapy Dataset (SACT) necessary to ascertain details on the SACT offered to patients both historically and in current clinical practice to outline how practice has evolved over time and illustrate changes in patient outcomes.
NDRS National Radiotherapy Dataset (RTDS) - necessary to illustrate historical and current practice within England and analyse patient outcomes.
NDRS Linked Hospital Episode Statistics (HES) Accident & Emergency (A&E) necessary as breast cancer patients with CNS disease are commonly presented to A&E. This could be a marker of CNS disease or a sign of clinical progression, and the number of A&E presentations could illustrate the impact on healthcare services.
NDRS Linked HES Admitted Patient Care (APC) necessary to ascertain number and duration of inpatient stay as breast cancer patients with CNS disease are often admitted to an inpatient ward after presentation to A&E. This could be a marker of progression with the CNS, provides detail on key time points where patients have changes to their oncological treatment, and could illustrate the impact on healthcare services.
NDRS Linked HES Outpatient necessary to assess when patients were reviewed by their oncology team as practices may vary nationally.
NDRS Linked Diagnostic Imaging Data Set (DIDS) necessary to ascertain differences in patient care across the country as the use and requirement of diagnostic imaging is likely to vary across England. This could also infer certain complications that a patient was experiencing.

The Level of the Data will be
Pseudonymised

The Data will be minimised as follows:
Limited to a study cohort identified by NHS England of patients aged 16 years old and over at the time of breast cancer diagnosis.
Limited to conditions relevant to the study identified by specific ICD or OPCS diagnosis codes.
Limited to the geographic area of England.

The lawful basis for processing personal data under the UK GDPR is:

Article 6(1)(e) processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller;

The lawful basis for processing special category data under the UK GDPR is:
Article 9(2)(i) processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards of quality and safety of health care and of medicinal products or medical devices, on the basis of Union or Member State law which provides for suitable and specific measures to safeguard the rights and freedoms of the data subject, in particular professional secrecy.

This processing is in the public interest because it adheres to the UK Policy Framework for Health and Social Care Research, which protects and promotes the interests of patients, service users and the public, and aims to produce generalisable and publicly available information to inform future decisions over patients treatments or care. There are benefit to the wider public/society as the audit aims to show what are the outcomes and care of patients with CNS disease in England, and how outcomes have changed over time. These are of direct importance to society given the data will either provide reassurance or raise issues that would need to be looked into in regards to the care of patients with CNS disease secondary to breast cancer.

The funding comes from multiple sources. Current funders include:
GILEAD Funding was in place until 31/03/2024
Daiichi-Sankyo Funding is in place until 28/02/2025
The funders will have no ability to suppress or otherwise limit the publication of findings.

Data will be accessed by:
Substantive employees and students affiliated with UoL. Any student working with the Data held under this Data Sharing Agreement (DSA) must have completed relevant data protection and confidentiality training and are subject to UoLs policies on data protection and confidentiality. Any students accessing the Data will do so under the supervision of a substantive employee of UoL. UoL would be responsible and liable for any work carried out by students. These students would only work on the Data for the purposes described in this DSA.

Yielded Benefits:

The data received has been analysed and the project team have created summary tables of the key clinic-pathological features of the patients such as age, sex, ethnicity, type of breast cancer, and site of the initial breast cancer based on ICD10 code. The data is demonstrating that overall that women with triple negative breast cancer have worse outcomes, and patients with HER2 positive breast cancer have better outcomes with brain metastases. There is also an improvement in overall survival over time for all patients as well as based on the subtype of breast cancer, namely triple negative, HER2-positive, and ER-positive. Further analysis is currently being undertaken on the treatments the patients have received and the possible benefit of these, as well as how outcomes may vary by where the patient was treated.

Expected Benefits:

The findings of this project are expected to contribute to evidence-based decision-making for policy-makers, local decision-makers such as doctors, and patients to inform best practice to improve the care, treatment, and experience of health care users relevant to the subject matter of the study.

The use of the data could:
help the system to better understand the health and care needs of populations.
lead to the identification or improvement of treatments or interventions, or health and care system design to improve health and care outcomes or experience.
advance understanding of regional and national trends in health and social care needs.
inform planning health services and programmes, for example to improve equity of access, experience, and outcomes.
inform decisions on how to effectively allocate and evaluate funding according to health needs.
support knowledge creation or exploratory research (and the innovations and developments that might result from that exploratory work).

It is hoped that through publication of findings in appropriate media, the findings of this audit will add to the body of evidence that is considered by the bodies, organisations and individual care practitioners charged with making policy decisions for or within the NHS or treatment decisions in relation to specific patients.

Outputs:

The expected outputs of the processing will be:
Submissions to peer reviewed journals (such as Lancet Oncology)
Presentations at national and international conferences (such as United Kingdom Breast Cancer Group, American Association of Clinical Oncology, and European Society of Medical Oncology)

All outputs will not contain NHS England Data and will only contain aggregated information with small numbers suppressed as appropriate in line with the relevant disclosure rules for the dataset(s) from which the information was derived.

The outputs will be communicated to relevant recipients through the following dissemination channels:
Public reports
Press/media engagement
Presentations to patient groups (such as METUPUK and Breast Mates Cancer Support Group)
Direct engagement with the Breast Cancer Now charity

The production and dissemination of the outputs is ongoing. Initial presentations are expected to take place in 2024.

Processing:

No data will flow to NHS England for the purposes of this Data Sharing Agreement (DSA).

NHS England will provide the relevant records from the above NDRS datasets to UoL. The Data will
contain no direct identifying data items. The Data will be pseudonymised and individuals cannot be reidentified through linkage with other data in the possession of the recipient.

The Data will not be transferred to any other location.

The Data will be stored on servers at UoL.

The Data will be accessed by authorised personnel via remote access.

The Controller(s) must confirm and provide evidence upon audit by NHS England that access via any remote device complies with the data security obligations within this DSA and the Data Sharing Framework Contract.

For remote access:
Remote access will only be from secure locations situated within the territory of use (as further restricted elsewhere within the DSA if so done) stated within this DSA;
Access controls granting users the minimum level of access required are in place;
Remote access is only via secure connections (e.g., VPNs or secure protocols) to protect data;
Multifactor authentication (MFA) is required for remote access;
Device security, including up-to-date software and operating systems, antivirus software, and enabled firewalls are utilised for the remote access;
All remote access is undertaken within the scope of the organisations DSPT (or other security arrangements as per this DSA) and complies with the organisations remote access policy.

The above applies in addition to any condition set out elsewhere within the DSA (e.g. who may carry out processing, and for what purpose).

Remote processing will be from secure locations within England/Wales. The Data will not leave England/Wales at any time.

Data will be accessed by students affiliated with UoL. Aside from these individuals, access is restricted to employees or agents of UoL who have authorisation from the Principal Investigator.

All personnel accessing the data have been appropriately trained in data protection and confidentiality.

The data will not be linked with any other data.

There will be no requirement and no attempt to re-identify individuals when using the data.

Analysts from UoL will process/analyse the Data for the purposes described above.

HES Extract - Integrated Longitudinal Research Resource - Developing neighbourhood resilience, reducing health inequalities — DARS-NIC-16656-D9B5T

Opt outs honoured: No - data flow is not identifiable, No (Excuses: Does not include the flow of confidential data, , )

Legal basis: Health and Social Care Act 2012, Health and Social Care Act 2012 – s261(1) and s261(2)(b)(ii), Health and Social Care Act 2012 s261(1) and s261(2)(b)(ii); Other-Health and Social Care Act 2012 - s.261(2)(b)(ii), Health and Social Care Act 2012 s261(1) and s261(2)(b)(ii), Health and Social Care Act 2012 - s261 - 'Other dissemination of information', Health and Social Care Act 2012 s261(2)(b)(ii); Other-Health and Social Care Act 2012 - s.261(2)(b)(ii), Health and Social Care Act 2012 s261(2)(b)(ii), Health and Social Care Act 2012 s261(2)(a)

Purposes: No (Academic)

Sensitive: Non Sensitive, and Non-Sensitive, and Sensitive

When:DSA runs 2017-10 – 2020-10 2017.12 — 2024.12.

Access method: Ongoing, One-Off

Data-controller type: UNIVERSITY OF LIVERPOOL

Sublicensing allowed: No

Datasets:

Hospital Episode Statistics Admitted Patient Care
Hospital Episode Statistics Accident and Emergency
Hospital Episode Statistics Outpatients
Emergency Care Data Set (ECDS)
HES-ID to MPS-ID HES Accident and Emergency
HES-ID to MPS-ID HES Admitted Patient Care
HES-ID to MPS-ID HES Outpatients
Hospital Episode Statistics Accident and Emergency (HES A and E)
Hospital Episode Statistics Admitted Patient Care (HES APC)
Hospital Episode Statistics Outpatients (HES OP)

Type of data: Anonymised - ICO Code Compliant

Objectives:

HES data will be used to develop a longitudinal panel of neighbourhood (Lower Super Output Area - LSOA) indicators. These will be used to investigate the impact on health care utilisation of risk factors, policies and interventions.

Analysis of this longitudinal panel will:

1. Investigate the impact across England of socioeconomic changes, national health and welfare policy changes, environmental changes and infectious disease trends on healthcare utilisation and whether there are neighbourhood level characteristics that modify these effects. Analysis will investigate inequalities between neighbourhoods in the consequences of these adverse trends and events. Analyses for this Objective will indicate the contextual factors driving adverse health outcomes and health service utilisation at the neighbourhood level.

2. Evaluate the impact of area based local authority and NHS economic, environmental, social, governance and service redesign activities on health outcomes and demand for health and social care services.

3. To develop predictive models of the factors driving adverse health trends and increases in demand for health services at the neighbourhood level, that can then be used by local agencies to better target resources at the root causes of ill-health and health service demand and the neighbourhoods most affected.

4. To develop new approaches for monitoring progress on health inequalities at the neighbourhood level and involving the public in using data to influence local services and policies - supporting Open Data initiatives to promote transparency and accountability.

Yielded Benefits:

The data sharing agreement has led to numerous benefits and changes in policy. Whilst there have been some changes to the original 8 planned papers in terms of titles and some delays to publication, there have been an additional 2 additional studies undertaken. Delays to publication largely reflect the fact that the review process by academic journals is not within the University of Liverpool’s control and research papers often are reviewed by multiple journals before being accepted for publication. Delays have also occurred due to incorrect data being provided by NHS Digital. Benefits to date and planned work not completed are outlined below. 1. The impact of trends in gastrointestinal infections on health care utilisation. Paper 1 and 2 have been prepared for submission and initial analysis was presented at the NIHR HPRU annual scientific meeting in 2018. Work is ongoing with local authorities in the North west to use the findings to change practice and develop approaches to better target the causes and consequences of gastrointestinal infections. 2. The environmental determinants of health care utilisation. Analysis for paper 3 has been presented to Public Health England and to the Scrutiny Committee for Liverpool City Region on air pollution. This evidence was used to estimate the health care costs of air pollution and to inform the Mayor’s Air Pollution Strategy. 3. The effect of changes in social care funding and welfare reform on health care utilisation. Analysis for this was initially produced, however at the same time similar analysis had been carried out by the Institute for Fiscal Studies (IFS), it was therefore not possible to publish the University of Liverpool’s analysis. A new paper (paper 4) has therefore been developed focusing on social care and children’s outcomes (the IFS focused on adults) This analysis will be completed by September 2020. 4. The health inequalities impact of initiatives to promote neighbourhood resilience. Initial work on this has been presented in briefing papers for local government, final analysis for paper 8 and 9 has been conducted and is due for publication in September 2019. This evidence is being used will indicate what works and provide evidence for local authorities across the country helping them develop initiatives that promote resilience, improve health and reduce inequalities. 5. The impact on health care utilisation of new models of out of hospital treatment and care and community orientated primary care. More extensive work than initially planned has been completed assessing the impact of new models of care this has included: Evaluation of community based cardiovascular and respiratory services and the Liverpool General Practice Quality Improvement Scheme. These papers (11-13) are all either submitted or well developed. The results have been presented to the commissioners of these services – and formed part of the decision to continue investing in these services. Policy briefings on these have been produced for clinicians, practitioners and commissioners. A assessment of the secondary care consequences of reduced investment in primary care has also been carried out (Paper 15), demonstrating that recent reductions in GP provision are leading to an increase in unplanned emergency admissions. This is being used to inform local primary care resource allocating strategy. The University of Liverpool’s assessment of the introduction of new anti-coagulant, has raised concerns about adverse bleeding complications resulting from these new drugs. (Paper 14) 6. Predicting adverse tends in neighbourhood health. The predictive modelling planned has not progressed as hoped due to limitations in capacity and delays in receiving data items. Initial work had been conducted to develop an openly accessible multi-dimensional small area index of ‘Access to Healthy Assets and Hazards’, that was validated using NHS Digital Data. This work has been published in Health and Place. Further work in developing this modelling tool has formed the basis of a 5-year continuation of the main research grant funding this work – the NIHR NWC CLAHRC. Other outputs 1. Construction of longitudinal panel dataset of neighbourhood indicators with linked socioeconomic data. This longitudinal panel data set has been produced and includes 75 indicators. Metadata and where appropriate Open Data from this indicator set will be published on the PLDR website which will be launched in July 2019. 2. Predictive modelling tool freely available to local authority and NHS organisations. The predictive modelling interface will enable local authorities and NHS organisations to better target resources and adapt services to local needs. This has not been completed as outlined above. 3. Web based Neighbourhood Resilience Interface developed. A web-based interface has been produced, however in consultation with residents in the neighbourhoods the University of Liverpool are working with – it was decided that this would not be made publicly available. This was because of concerns with stigmatising these disadvantaged neighbourhoods. The Interface is available with a registered group of community researchers through a password protected portal. This has been used by these community groups to identify local needs; monitor progress and advocate for change.

Expected Benefits:

Benefits from reviewed journal papers and related analysis.

1. The impact of trends in gastrointestinal infections on health care utilisation.
Analysis indicating the impact of gastrointestinal (GI) infection trends on health care utilisation and the extent to which this is mediated by socioeconomic and health service related factors, will indicate how targeted interventions that reduce GI infections and actions that influence the health seeking behaviour of people with GI could reduce healthcare usage. Alongside this analysis Liverpool are working with local public health and environmental health teams to develop targeted interventions to reduce inequalities in the causes and consequences of gastrointestinal infections. This analysis will inform the development of these interventions leading to more effective approaches. For example this could include actions to support parents caring for children with gastrointestinal infections and promoting alternatives to A&E by enhancing support through pharmacies and primary care.

2. The environmental determinants of health care utilisation.
This analysis will identify the extent to which environmental factors, such as air pollution, flood risk, housing quality and fuel poverty influence health care utilisation and inequalities in these effects by area deprivation. Previously strategies to manage demand for health care services have focused on service redesign rather than environmental determinants of health. This analysis will be used to develop strategies with local partners to reduce demand for health care by addressing important environmental determinants. The results will indicate the potential savings to the NHS from investment in initiatives to reduce fuel poverty or improve air quality, for example. This will then lead to benefits both through improving health and reducing preventable health care costs.

3. The effect of changes in social care funding and welfare reform on health care utilisation.
This analysis will indicate the effect of changes in social care funding and welfare reform on health care utilisation and the factors that might mitigate these effects. Funding for social care is currently being reduced relative to demand, and major welfare reforms are being introduced, however, it is not currently known what effect this is having on healthcare utilisation. The analysis will indicate the potential costs to the health service of these policies. It will inform national policy debates about the costs and benefits of different approaches to welfare reform and the allocation of resources for health and social care services. It will help identify the characteristics of local systems that are more resilient to these changes – enabling the development of local health, social care and welfare systems that can better improve health and reduce health inequalities.

4. The health inequalities impact of initiatives to promote neighbourhood resilience.
This analysis will indicate the health inequalities impact of a number of local initiatives that aim to promote economic, environmental and social resilience in disadvantaged neighbourhoods in the North West. These include initiatives to improve housing, increase financial security, reduce social isolation and improve public involvement and governance. This will indicate what works and provide evidence for local authorities across the country helping them develop initiatives that promote resilience, improve health and reduce inequalities.

5. What components of resilience have the greatest impact on health.
The University have developed a model of resilience with local authorities in the North West that focuses on economic, environmental, social and governance systems. However it is not yet known what the relative impact of these components is on health and health inequalities. Analyses will indicate the health gains that could be expected for investments in different components of this model and the interactions between them. This will enable the more efficient use of resources to develop more resilient systems that reduce health inequalities.

6. The impact on health care utilisation of new models of out of hospital treatment and care and community orientated primary care.
There are currently a large number of new models of out of hospital treatment and care, being developed across the country, particularly as part of the Vanguard programme. New initiatives are often overlaid on top of and interact with existing programmes and wider system changes. The NHS and local authority partners of the NWC CLAHRC have identified this as a priority for the research programme supported by the NWC CLAHRC over the next 3 years. Analyses will identify the components of new models of care along with wider system changes that appear to be effective, both within primary care and at the interface of primary, secondary and social care. Analysis will particularly focus on how these effects differ across socioeconomic groups and interact with the social and environmental determinants of health. This will support the development of out of hospital care that addresses inequalities and improves health whilst reducing healthcare utilisation. For example this could include new approaches for incorporating wider social support in general practice through the third sector or identifying the key components for the effective integration of health and social care teams.

7. Predicting adverse tends in neighbourhood health.
Increasingly health and social care systems are using risk prediction and stratification methods to target resources and interventions. These have tended to use individual risk factors and model risk at the individual level. This tends to neglect the impact of environmental and area-based determinants of health outcomes. This paper will outline the methods used to develop a risk prediction model that is based on neighbourhood level analysis, incorporating a broader set of individual and environmental determinants than models based solely on individual risk factors. Publishing the methods for producing the model will enable the robust development of a tool that local authorities and NHS organisations can use to target the right actions at the right risk conditions in the right neighbourhoods to most effectively improve health and reduce health service demand (see below).

Conferences/presentations.
1. NIHR HPRU annual conference – 2017
This presentation will be used to disseminate the early results from the analysis for Paper 1 to an audience of NHS, Health Protection and Environmental Health practitioners. This will enable them to develop more effective approaches that reduce the impact of gastrointestinal infections in disadvantaged neighbourhoods.

2. European Public Health Association Conference – 2017
This presentation will be used to disseminate and discuss the early results from the analysis for paper 2 to an international audience of public health practitioners, policy makers and academics. This will enable them to make the case for investment in and development of strategies to reduce demand for health care by addressing important environmental determinants of health. It will also stimulate cross-country learning about effective approaches to reduce environmental determinants of health, leading to improved public health policies.

3. Public Health England Annual Conference and Local Government Association Conferences – 2018
These conferences will be used to present early findings from the analysis for papers 4 and 5 to audiences of public health practitioners, other local authority professionals and local government policy makers. This will enable them to make evidenced based decisions about how scarce resources are invested locally in actions to improve the social determinants of health. For example this could indicate whether investment in employment services is likely to be more or less effective than investment in services to reduce social isolation and which are likely to be the important components of these initiatives that increase effectiveness.

4. Annual Primary Care Conference - 2018. This conference will be used to present findings from Papers 6 & 7 to an audience of GPs, Commissioners and other health care professionals – demonstrating the impact of new models of out of hospital care that have been developed in the North West. This will enable other regions to learn about what works for which patient groups enabling the sharing of best practice and the improvement of health and social care services.

Policy and practice briefings

1. Developing resilient neighbourhoods.
This will synthesise the results from the analysis outlined for papers 4 and 5 above with other research being carried out through the NWC CLAHRC on neighbourhood resilience- including systematic reviews of the evidence and qualitative research in the intervention neighbourhoods. It will provide practical advice for local government organisations indicating approaches that are likely to be effective at promoting resilience and addressing the social determinants of health. This will lead to more effective local government policies and activities that deliver greater health benefits than would otherwise be the case.

2. New models of out of hospital treatment and care, what works for whom?
This will synthesise the results from the analysis outlined for papers 6 and 7 above with other research being carried out through the NWC CLAHRC on out of hospital care including systematic reviews of the evidence and qualitative research in the intervention neighbourhoods and GP practices. It will provide practical advice for NHS and local government organisations indicating approaches to out of hospital care that are likely to be effective at reducing health inequalities and reducing demand for health and social care services. Importantly it will identify which components are likely to be particularly effective in deprived neighbourhoods and which approaches risk widening health inequalities.

3. Using neighbourhood predictive modelling to plan and target prevention.
This will provide a practical guide for local government and NHS organisations to use the neighbourhood risk model developed through this project to better target resources and adapt services to local needs. This will lead to benefits through the development of more appropriate local services.

Other outputs
1. Construction of longitudinal panel dataset of neighbourhood indicators with linked socioeconomic data.

This dataset will be a resource that will be used by a number research projects within the NWC CLAHRC for the purposes outlined in this application. Statistical code used to develop the indicators will be made available to other researchers and the longitudinal panel dataset could also be made available more broadly for research that benefits health and social care. As outlined above where possible and following risk assessment and guidance from the HSCIC these data will be made available as Open Data. The National Institute for Health Research and the Medical Research council have recognised the need for more research that uses routine datasets such as this to evaluate the impact of public policies as “natural experiments”. This work will provide a major advance in these methods and data resources to support them leading to benefits to patients and the public through the rapid evaluation of public policies that have an impact on health.

2. Predictive modelling tool freely available to local authority and NHS organisations. The predictive modelling interface will enable local authorities and NHS organisations to better target resources and adapt services to local needs.
This will lead to the more efficient and effective use of resources leading to health benefits for patients and the public.

3. Web based Neighbourhood Resilience Interface developed.
The development of this freely available interface will support community groups and residents in disadvantaged neighbourhoods to identify local needs; monitor progress and advocate for change. This will lead to improved and more effective local services, it will support local community groups in making the case for funding in disadvantaged areas leading to increased investment.

Outputs:

Planned journal submissions for publications

At least 8 publications in high impact peer reviewed journals are expected from this work. These are outlined below.

Paper 1. The impact of gastrointestinal disease trends on health care utilisation and the extent to which these are mediated by socioeconomic and health service related factors. - Lancet Infectious diseases - January 2018

Paper 2. The environmental determinants of health care utilisation and inequalities in these effects by area deprivation - International Journal of Epidemiology - January 2018

Paper 3. The effect of changes in social care funding and welfare reform on health care utilisation 2010 and 2017; are some places more resilient than others?, British Medical Journal - January 2018

Paper 4. The health inequalities impact of initiatives to promote neighbourhood resilience. American Journal of Public Health - January 2019

Paper 5. What components of resilience have the greatest impact on health - the implications for inequalities. Journal of Epidemiology and Community Health - January 2018

Paper 6. The impact on health care utilisation of new models of out of hospital treatment and care, British Medical Journal - January 2018

Paper 7. The impact on health care utilisation of community orientated primary care, British Medical Journal - January 2018

Paper 8. Predicting adverse tends in neighbourhood health - April 2019. American Journal of Public Health.

The findings from the research will be disseminated through the following Conferences Presentations:

NIHR HPRU annual conference - 2017

European Public Health Association Conference - 2017

Public Health England Annual Conference - 2018

Local Government Association Conference - 2018

Annual Primary Care Conference - 2018.

Policy and Practice Briefing papers

The University of Liverpool will produce a series of freely available briefing papers directed at practitioners, commissioners and policy makers in local government and NHS organisations.

1. Developing resilient neighbourhoods.

2. New models of out of hospital treatment and care, what works for whom?

3. Using neighbourhood predictive modelling to plan and target prevention.

Other Outputs

Longitudinal panel dataset of neighbourhood indicators.
The initial product of this project will be a longitudinal panel dataset of neighbourhood indicators. This will initially be used by research groups within the NIHR CLAHRC NWC as outlined above. Where possible and following risk assessment and guidance from the HSCIC these data will be made available as Open Data. Where necessary this will involve removing sensitive indicators and aggregating indicators to higher geographies to ensure anonymity is maintained. Open Data available by September 2018.

Predictive modelling tool.
As outlined in the analysis section for Objective 3, a predictive model will be developed that can be used by local government and NHS organisations to predict those areas that are most likely to experience adverse trends in health outcomes and health care utilisation in the future. An online interface will be developed that enables local authorities to use this model to visualise and identify high-risk neighbourhoods. This will be made freely available for use by local government and NHS organisations. January 2019

Web based Neighbourhood Resilience Interface. As outlined above, the development web based presentations of the Longitudinal panel dataset of neighbourhood indicators that will enable local groups to interact with the data, including mapping data, comparing neighbourhoods and visualising trends over time. This will support community groups to identify local needs; monitor progress and advocate for change promoting transparency and accountability. This will be freely and publically available. Developed January 2019.

All outputs will be risk assessed for the potential of re-identification and will only include aggregate data with small numbers suppressed in line with HES analysis guidance.

Processing:

Step 1 - Indicator development.

In the first step of data processing indicators will be developed for each Lower Super Output Area (LSOA) in England from 2004-05 to 2017-18. The data request has been limited to these years as this is the minimum number of years that is sufficient to measure change over time within neighbourhoods. This process will involve a number of stages to develop robust indicators which are likely to be sensitive to socioeconomic and environmental change, national social and welfare policy changes and local health and social care redesign initiatives.

Initially the University of Liverpool are developing theoretical models for the exposures and interventions being investigated. These outline the likely mechanisms through which these factors are likely to have an impact on hospital activity. As well as developing theoretical models of the impact of national socioeconomic, environmental and policy changes, Liverpool are working with local stakeholders to identify, prioritise and develop models for local NHS and Council initiatives.

These will then be used to identify candidate indicators that are likely to be affected by these changes and initiatives. Indicator definitions will be developed and the data quality and precision tested. Categories will be refined and time periods pooled to provide sample sizes within each cell that give estimates that are sufficiently precise and comply with the HSCIC Small Numbers Policy / HES analysis guide. The reliability and validity of indicators will be investigated by testing the association between candidate indicators and other measures of similar constructs from different data sources. In particular, indicators will be compared to measures derived from a household health survey, which has been carried out across neighbourhoods in the North West. Indicators will then be refined in consultation with local NHS and Local Authority stakeholders.

It is likely that the indicators will include measures of particular groups of morbidities (e.g chronic conditions, mental health or alcohol related conditions, accidents), some will be age specific (e.g asthma admissions in children, accidents on children, falls amongst older people), some will be limited to particular admission type (e.g emergency admissions for particular chronic conditions) some will be directly related to processes of care – e.g delayed discharge, length of stay etc). Where relevant indicators will be replicated at higher geographies and by GP practice.

Step 2 – Matching and linking LSOA level data.

In Step 2 data will be matched at the LSOA level to other national datasets indicating socioeconomic change, national social and welfare policy changes, environmental changes, morbidity trends and uptake of local authority and NHS initiatives. These datasets only include pseudonymised data and do not include any personal data, and linkage will only occur at the area level minimizing the risks of re-identification due to data linkage.

National and local small area datasets that will be used alongside neighbourhood level indicators derived from HES data:

National Datasets.
• Modelled LSOA level prescribing data
• LSOA population estimates
• Housing overcrowding data (census)
• Modelled LSOA air quality indicators for 2001, 2005, 2008 and 2012
• Crime data by LSOA
• Economic activity
• Self-reported health (census)
• DWP statistics on the number of claimants of welfare benefits by LSOA
• The number of laboratory reports for gastrointestinal infections by LSOA
• Flood warning areas mapped to LSOA
• Density of fast food and alcohol outlets, access to green spaces,
• Housing quality indicators
• Small area fuel poverty indicators.
Local datasets.
• Number people receiving emergency food from food banks by LSOA (local authority)
• Number of people attending swimming / gym activities by LSOA (local authority)
• Number of people receiving social care services by LSOA (local authority)
• Number of people requesting debt/ financial/housing/welfare advice by LSOA (local authority)
• Numbers accessing credit unions (local authority)
• Local authority licensing data (local authority)

This will result in a longitudinal panel dataset of neighbourhood indicators of hospital activity and potential determinants of health and health care use.

To achieve Objective 2, LSOAs within this dataset will then be mapped to areas involved in a number of area-based interventions in the North West of England. The Collaboration for Leadership in Applied Health Research North West Coast (CLAHRC NWC) is working with the NHS, Local Government organisations and residents to prioritise existing interventions and to develop and changes those based on evidence and to evaluate their impact on health and health inequalities. These include health and social care service redesign initiatives as well as initiatives that aim to promote the resilience of local economic, social, environmental and governance systems. GP practice codes will also be mapped to groups of GP practices involved in health and social care redesign initiatives that are targeting GP registered populations rather than particular neighbourhoods. These intervention areas will then be matched with both national and regional (NW) control areas with similar characteristics, in order to evaluate the impact of these interventions on health outcomes and health service use.

Step 3 Analysis.
Objective 1 – Nationwide analysis.
Analysis for Objective 1 will use the longitudinal panel dataset for the whole country. Longitudinal analysis methods will be used to investigate the association between socioeconomic changes, welfare policy changes, environmental changes and infectious disease trends within neighbourhoods and changes in indicators of health service utilisation. Mediation and interaction analysis will then investigate whether these effects are modified by other neighbourhood characteristics – e.g area deprivation, characteristics of the physical environment, health and social care services, local governance arrangements.

Objective 2 – Evaluations of local initiatives
Analysis for Objective 2 will use the longitudinal panel dataset for local intervention areas alongside data from national and regional matched control areas to evaluate the impact of interventions whilst controlling for the contextual and national trends identified through the analysis for Objective 1.

Objective 3 – Predictive models.
This analysis will use the findings from Objectives 1 and 2 in multivariable analysis to develop predictive models of the modifiable factors driving adverse health trends and increases in demand for health services at the neighbourhood level. These will be developed to not only identify neighbourhoods at high risk, but also to predict those areas that are most likely to experience adverse trends in health outcomes and health care utilisation in the future. Working with local government and NHS organisations the University of Liverpool will develop and evaluate approaches for the practical application of these predictive models to support the more effective use of local resources.

Objective 4 - Community led approaches for monitoring progress on health inequalities at the neighbourhood level.

A selection of the indicators from the aggregate longitudinal panel dataset will be developed in order that they can be made publically available as Open Data (see below controls in place to minimise risks). Working with a network of community organisations who are part of the NWC CLAHRC Community Researcher and Engagement Network (COREN), these indicators will be used to test out new community led approaches for monitoring progress on health inequalities at the neighbourhood level. This will involve the development of web based presentations of data that would enable local groups to identify local needs, monitor progress and advocate for change promoting transparency and accountability.

Data governance, management and controls in place for data access and procedures to minimise the risk of re-identification;
The Integrated Longitudinal Research Resource
The usage of the HES data included in this request and the other small area datasets will be managed through the Integrated Longitudinal Research Resource (ILRR). The ILRR is a data management resource at the University of Liverpool established by the NIHR CLAHRC NWC in collaboration with the NIHR Gastrointestinal Health Protection Research Unit (GIHPRU) and the Consumer Data Research Centre(CDRC).

The ILRR includes a dedicated Data Scientist, secure servers and robust policies for data sharing and data usage. The ILRR is overseen by a governance board, which approves access to data for specific usages based on criteria specific to each dataset. The governance board includes representatives from the NIHR CLAHRC NWC, NIHR GIHPRU and CDRC, NHS and Local government partners, a public advisor and an NHS information governance expert.

Controls in place for managing access to the HES data in this request.
Only ILRR data scientists based at the University of Liverpool, will have access to the record level HES data included in this request. No third party will have access to the record level data. The HES data included in this request and the panel of aggregate longitudinal neighbourhood indicators derived from that data will be consistently documented, catalogued and coded and stored in a secure SQL server database.

Only aggregate data with small numbers suppressed in line with HES analysis guide will be made available to other researchers. This aggregated small area data will still be treated as safeguarded data, with specific data items only being made available to researchers as needed for specific analysis plans, with data only released after any risks of re-identification have been assessed and mitigated by ILRR data scientists.

Access to the aggregated panel dataset of neighbourhood indicators will be limited to research groups that are part of the NIHR NWC CLAHRC (unless data is made available as Open Data – see below). These research groups include academic researchers from Liverpool, Lancaster and Central Lancaster Universities as well as analysts from NHS and Local Government organisations. Each group of researchers will outline a detailed analysis plan relating to each of the Objectives above, describing which aggregate indicators of hospital activity they require access to and which indicators related to socioeconomic change, national social and welfare policy changes, environmental changes, morbidity trends and those related to local area based local authority and NHS interventions. Each of these detailed analysis plans will be reviewed by the Integrated Longitudinal Research Resource (ILRR) governance board. Data will only be released only if the data is to be used according to the purposes outlined in this application. Only aggregate data that only includes the variables required for the specific analysis of each group will be released. Each request will be assessed by an experienced Data Scientist to identify if there are any risks of data being re-identified as a result of the linkage with other data sources, and to mitigate these risks. This risk assessment will be based on the procedures outlined in the Anonymisation Standard for Publishing Health and Social Care Data Specification. None of the datasets that will be used to develop linked LSOA indicators include any personal data, therefore risks of re-identification due to data linkage is low.

Open data.
As outlined under Objective 4 the aim is to develop a selection of the aggregate indicators derived from HES data so that they could be released as Open Data. The risk of re-identification for each of these indicators will be assessed using the procedures outlined in the Anonymisation Standard for Publishing Health and Social Care Data Specification and measures taken to ensure the level of anonymisation is low enough to allow public release. For example this could involve aggregating these indicators the ward level (average population size 10,000), rather than at the LSOA or pooling data over a number of years. The HSCIC will be consulted before any indicator is releases under the Open Government License. These Open Data aggregate indicators will then be used in work with a network of community organisations and members of the public who are part of the NWC CLAHRC Community Researcher and Engagement Network (COREN), to involve members of the public in identifying local needs, monitoring progress and advocate for change to improve services.

The role of the institutions involved in these grants.

The University of Liverpool (UoL) will be the sole data controller and data processor for this application and all record level data will be processed at the UoL. Only data scientists based at the UoL and employed by the UoL will have access to the record level data.

The ILRR governance board that includes representative from the NIHR CLAHRC NWC, NIHR GIHPRU, CDRC and local NHS and LA organisations will oversee procedures and processes for accessing the small area aggregate level data derived from the record level data, and assess and approve requests from research groups to use this data. These research groups will only have access to aggregate datasets that have been risk assessed by data scientists at UoL and comply with HES small number analysis guidance. These research groups will include partners who are members of the NIHR CLAHRC NWC collaboration, including researchers from Liverpool, Lancaster and Central Lancashire Universities, as well as analysts from local NHS and Local Government organisations.

As is required by the NIHR, the research from this project will be published in peer-reviewed journals that are compliant with the NIHR policy on Open Access.

MR1298: UK Lung Cancer Screening Trial Lung Cancer Registry and Mortality data for consented individuals — DARS-NIC-19237-R3T6S

Opt outs honoured: No - consent provided by participants of research study, No (Excuses: Reasonable Expectation, Consent (Reasonable Expectation))

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC, Health and Social Care Act 2012 – s261(7), Health and Social Care Act 2012 – s261(2)(c), Health and Social Care Act 2012 s261(2)(c)

Purposes: No (Academic)

Sensitive: Sensitive

When:DSA runs 2019-06 – 2022-05 2017.12 — 2024.10.

Access method: Ongoing, One-Off

Data-controller type: UNIVERSITY OF LIVERPOOL

Sublicensing allowed: No

Datasets:

MRIS - Members and Postings Report
MRIS - Cause of Death Report
MRIS - Cohort Event Notification Report
Civil Registration - Deaths
Demographics
Cancer Registration Data
MRIS - Flagging Current Status Report
Civil Registrations of Death

Type of data: Identifiable

Objectives:

The overall aim of the trial was to provide data required for an informed decision about the introduction of population screening for lung cancer. This involved establishing the impact of screening on lung cancer mortality, determining the best screening strategy and assessing the physical and psychological consequences and the health implications of screening. An additional objective was to create a resource for future improvements to screening strategies.

It was initially anticipated that the pilot study would be followed by a more in-depth extended trial with a larger cohort of people. This did not receive further funding however and data will be limited to the pilot study with a cohort of 4,061 participants, of which recruitment has now ended. Although further data is being requested under this agreement, it will be 'follow up' data for the original cohort only e.g. Cause of Death, Date of Death and Cancer Registry data.

The data will be recorded on the United Kingdom Lung Cancer Screening Trial (UKLS) database and pseudonymised data given to the named researchers in order to ascertain any mortality advantage to screening and inform the UK National Screening committee.

Any future sharing of record-level data would be subject to an amendment application requiring NHS Digital approval.

Yielded Benefits:

Some preliminary analysis has been undertaken utilising the NHS Digital data, which been used to develop a lung cancer CT screen nodule risk model, which is in its final stage of development but currently data from NHS Digital data has not been published/released or reported upon as the data has not matured sufficiently yet. Data received to date has been utilised to update the UKLS database to ensure University of Liverpool did not contact deceased individuals.

Expected Benefits:

ONS Mortality and Cancer Registry Data have been received on a quarterly basis since 2012. This was vital information during the conduct of the trial as any deceased participants, or those diagnosed with lung cancer, were annotated on the database and marked as "Off Study" so that the UKLS project team would not contact them again to arrange repeat scans or complete follow up questionnaires.

The trial has now finished but the follow up data on deaths and lung cancer diagnosis is still required. Once the outcome data is available, the success of the screening can be evaluated. This analysis will be provided to the UK National Screening Committee (UKNSC) to help inform decision-making as to whether a lung cancer screening programme should be implemented in the UK. The UKNSC will not be using only the UKLS analysis, but will also receive analysis from a larger lung cancer screening trial run in the Netherlands (NELSON - European Nederlands-Leuvens Screening Onderzoek). The NELSON Trial is due to report in soon.

A future exercise to combine UKLS and NELSON data is anticipated, but further data sharing will not be undertaken prior to approval from NHS Digital by means of a separate application.

Outputs:

Data already received from NHS digital has not had any analysis undertaken, nor any data published/released or reported upon. Data received to date has been utilised to update the UKLS database to ensure University of Liverpool did not contact deceased individuals.

The initial findings/conclusions of the trial data were published in the BMJ-Thorax Online First. This included methods, trial design, recruitment, randomisation, nodule management, number of cancers, treatment, cost effectiveness modelling. The full report of these aspects of the UKLS trial has also been published by the funder, National Institute for Health Research, Health Technology Assessment Programme (NIHR HTA). Both of these are available as open publications.

Future specific outputs anticipated June to December 2017 in the form of presentations/publications and peer review journals will be added to the UKLS website.

One specific output will be a report to the UK National Screening Committee on the cost effectiveness and mortality benefit of introducing a lung cancer screening programme into the UK. Prior to this report it will be necessary for the statistician to analyse the data on causes of death and lung cancer diagnoses.

The UKLS statistician is also designing a risk model to predict lung cancer utilising nodule data from the UKLS study and, if successful, will be submitted for publication. The most appropriate publication will be identified when the analysis is complete. This may include Epidemiological or Radiological publications, such as BMJ-Thorax. Submission to a publication does not guarantee acceptance so it may be submitted to more than one publication before being accepted.

Outputs will contain only data that is aggregated with small numbers suppressed in line with the HES Analysis Guide.

Processing:

An updated cohort Excel file (containing details of participants who have given informed consent) will be sent to NHS Digital by UKLS Project Manager. The updated cohort file is as a result of removal of those participants who have died (as informed by previous data received from NHS Digital).

NHS Digital will upload linked dataset file onto their secure portal and notify UKLS Lung Cancer IT technician. The file will be downloaded and saved as a password protected document into a folder.

The UKLS Project Manager will update the UKLS database with deaths and cancers notified to ensure no further contact with those individuals is attempted.

The Lung Cancer IT Technician will write/run queries to extract selected data from the UKLS ONS/Cancer Registry database. The output will include the pseudonymised unique patient identifier (MPI) in order that it can be linked to subject data held by UKLS. Subject data is data that has been provided by the participants as part of the trail. For those randomised to the CT screening arm, details of CT scan results are held and any treatment received as part of the trial. This data will not be linked with any other patient-level data.

Researchers have access to the pseudonymised data for analysis only, which is imported into statistical software, usually SAS, STATA, or Excel.

Only substantive employees of the University of Liverpool will process the data and only for the purpose as defined in this agreement.

The analysis (as anonymised, aggregate data) will be the subject of publication (see specific outputs), however record level data will be viewed by the named users in this agreement only.

The clinical database used within the UKLS has data for 4,061 subjects; all data is held securely (with additional password protection) and accessed only by the named users, in compliance with The University of Liverpool Data Policies. Although the database includes NHS numbers, only pseudonymised data will be made available to researchers, subjects will be identified using a pseudonymised unique identifier in any extracted data.

Any analysis will be viewed by the named users in this agreement only, further data sharing beyond the named users may be required in the future however this will be requested by means of a further application to NHS Digital.

HElping Alleviate the Longer-term consequences of COVID-19 (HEAL-COVID): a national platform trial — DARS-NIC-433257-K6Q2Y

Opt outs honoured: No (Excuses: Consent (Reasonable Expectation))

Legal basis: Health and Social Care Act 2012 s261(2)(c), Consent (Reasonable Expectation); Health and Social Care Act 2012 s261(2)(c)

Purposes: No (Academic)

Sensitive: Sensitive, and Non-Sensitive

When:DSA runs 2021-06 – 2024-06 2021.12 — 2024.09.

Access method: Ongoing

Data-controller type: CAMBRIDGE UNIVERSITY HOSPITALS NHS FOUNDATION TRUST, UNIVERSITY OF CAMBRIDGE, UNIVERSITY OF LIVERPOOL

Sublicensing allowed: No

AGD/predecessor discussions: IGARD Minutes - 17th June 2021 final.pdf, igardminutes-11thfebruary2021final.pdf, IGARDMinutes-11thMarch2021final.pdf

Datasets:

Civil Registration - Deaths
Emergency Care Data Set (ECDS)
Hospital Episode Statistics Admitted Patient Care
Hospital Episode Statistics Critical Care
Hospital Episode Statistics Outpatients
Medicines dispensed in Primary Care (NHSBSA data)
Secondary Uses Service Payment By Results Episodes
Civil Registrations of Death
Hospital Episode Statistics Admitted Patient Care (HES APC)
Hospital Episode Statistics Critical Care (HES Critical Care)
Hospital Episode Statistics Outpatients (HES OP)

Type of data: Anonymised - ICO Code Compliant, Identifiable

Objectives:

The World Health Organisation declared the COVID-19 outbreak a public health emergency of international concern on 30th January 2020. The acute effects of COVID-19 are now well described, however COVID-19 is a new disease, the natural history of which remains uncertain. Long-term outcomes for COVID-19 are currently unclear, but early data suggests a significant burden of mortality and morbidity. In this situation, even treatments with only a moderate impact on survival or on hospital resource are worthwhile. Therefore, the focus of HEAL-COVID is the impact of candidate treatments on mortality and the need for re-hospitalisation following discharge from hospital.

The legal basis for the processing and storage of personal data is that it is a task in the public interest Article 6 (1) (e). It is in the public interest to conduct this clinical trial funded by the National Institute for Health Research (NIHR) Health Technology Assessment (HTA) programme which aims to find treatment to reduce mortality and morbidity following acute COVID-19. This routine data is necessary to identify morbidity and mortality during the follow up period. The data will be analysed to produce statistical summaries (Article 9(2)(j)).

The research has received appropriate Health Research Authority approval throughout the UK.

This is a clinical trial platform which will have at most one control and three active arms at any one time. Treatment arms may be removed or added during the course of the trial with the COVID-19 Therapeutic Advisory Panel (CTAP) proposing the treatment arms. It is expected that 877 participants will be randomised to each treatment arm. Therefore the estimate for total cohort size, once recruited is around 3,500 at most.

The aims of the study are to determine the safety and effectiveness of treatments in the overall treatment of Long COVID. The datasets requested are required to determine mortality and morbidity causes and resource-use, including prescription dispensing as a measure of compliance. This is a clinical trial that uses the data minimisation principle as required by GDPR. All the data requested maps to an outcome specified with the clinical trial protocol, and the list of outcomes is as follows:

- The primary outcome of this study is hospital-free survival (data requested related to mortality and hospital readmission).
- Economic outcomes: Incremental cost-effectiveness, from the perspective of NHS resource use and based on quality-adjusted life years estimated from responses to the Euroqol EQ-5D-5L questionnaire.

Secondary outcomes include
(i) All-cause mortality (data requested relates to cause of death).
(ii) Hospital readmission after discharge from index hospital admission (data requested to identify those patients).
(iii) Patient-reported outcomes assessed using symptom specific measures.
(iv) Suspected Serious Adverse Reactions (data requested relates to mortality. Additionally, suspected serious adverse reactions reported via CRFs).

Note that all members of the team running the trial are based at Liverpool Clinical Trials Centre (LCTC) at the University of Liverpool. For the sake of consistency, and as University of Liverpool is listed as a data controller, when referring to University of Liverpool (UoL) throughout the application, this encompasses the team at LCTC.

University of Liverpool (UoL) will provide the HEAL-COVID participants NHS number, date of birth and corresponding randomisation number (Study ID) to NHS Digital for linking purposes. Randomisation date and End of study participation date will also be provided for each participant, as the protocol requires 12 months follow-up for each participant from randomisation. End of study participation date will either be the date when the 12 month follow up point was reached or the date the participant withdrew from the study. All hospitals across the UK will be eligible to recruit participants.

Patients will be recruited in the hospital setting and then discharged into the community setting where the majority of data collection takes place. Unless the routine data requested is provided then the burden of mortality and morbidity within this population will be underestimated. There is no other alternative that does not place a significant burden on sites that can be supported during this pandemic setting.

University of Cambridge and Cambridge University Hospitals NHS Foundation Trust are joint sponsors and are legally responsible for the trial. Along with University of Liverpool (UoL) they are joint data controllers. UoL will have access to patient NHS numbers, DOB and randomisation number. Bangor University are processors of the health economics data. The data sent to Bangor University will be pseudonymised. Aparito are a data processor of the patient completed questionnaires only, they will not process the data requested from NHS Digital.

Co-applicants of the grant application to the NIHR HTA are involved as part of the wider study. However, they are not data processors or data controllers. Their role is to provide advice on the trial design and conduct of the trial and they have no role in determining the means by which the personal data are being processed under this agreement. They will not view the disaggregated data. The funder will have a role in monitoring study progress but will not be a data processor or data controller, nor will they have any influence on the outcomes nor suppress any of the findings of the research. None of the additional funders listed in the protocol have any commercial interests in relation to the treatments being proposed for this trial.

This clinical trial is a stand-alone trial and should there be any plans for future linkage of this data to other datasets then this application will come back via NHS Digitals DARS / DigiTrials process for amendment.

Expected Benefits:

COVID-19 is a new disease for which the natural history remains uncertain. However, recent data highlight an ongoing convalescent mortality rate as high as 10% and that ~20% of patients develop new or worsened cardiopulmonary symptoms within 60-days after hospital discharge, suggesting that pathogenic processes persist. A unique feature of COVID-19 is the high incidence of cardiovascular and pulmonary complications including venous thromboembolism, persistent lung inflammation, and pulmonary fibrosis. These may not be confined to the acute phase of the illness, but rather may also occur during the convalescent phase of the illness, thus providing a major contribution to the ill-defined syndrome long COVID. There are no known treatments for this condition/phase. This trial aims to provide knowledge within the project timescales, on how to treat patients with COVID-19 in the post-acute phase. This trial is considering licensed treatments and the beneficiaries will be the NHS and patients.

The dissemination of the research is in the public interest as the trial will provide information that will potentially help reduce mortality and morbidity in a worldwide pandemic.

Expected benefits within the project timescales (March 2021 to February 2024) include:
Knowledge of how to treat patients with COVID-19 in the post-acute phase
Reducing burden on the NHS during a pandemic by identifying treatment to improve survival and reduce readmissions
Reduction in deaths associated with COVID-19 in the post-acute phase
Reduction in re-hospitalisations associated with COVID-19 in the post-acute phase
Improvements to society and economic recovery by identifying treatments to promote a faster return to full health and usual activities
Information to assist commissioners in making decisions to better support patients
Evidence-based information to enable Department of Health and NHS England to provide guidance and develop policies to treat patients in the post-acute phase of COVID-19
Data on cost effectiveness of treatment, which may lead to potential cost savings for the NHS

Outputs:

The aim of the trial is to change the standard of care for patients with COVID-19 by determining which interventions do and do not alter longer-term clinical outcomes. The outputs of the study will be disseminated to both patient and public audiences, as well as healthcare audiences via open access peer reviewed publications, presentations to conferences as well as patient/participant groups, press releases (where appropriate), social media, newsletters and clinical management guidelines. In addition, the aim is to establish an online digital patient community to ensure the findings of our work are effectively shared with people who experience Long COVID.

Only aggregated summaries of data with small number suppressed in line with the HES analysis guide will be contained in the outputs. No patient level information will be provided.

HEAL-COVID is a CTIMP and as such there is a requirement to report within 12 months of the end of the trial (February 2024). Outputs will be published at the end of the trial and throughout as each treatment arm complete. Exact dates for publication cannot be provided as it is not known at this stage when complete datasets will be available for each treatment arm, however this is expected to be between January 2022 and February 2024. Outputs are approved following NIHR HTA review.

Processing:

All organisations party to this agreement must comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by "Personnel" (as defined within the Data Sharing Framework Contract i.e. employees, agents and contractors of the Data Recipient who may have access to that data).

UoL will securely provide NHS Digital with a file containing a list of NHS numbers, dates of birth and randomisation numbers (Study IDs) of patients recruited into HEAL-COVID. Date of randomisation for each participant will also be provided to indicate the start date from which data will be collected for a participant. End date will also be provided so that data is not requested after the 12 months of follow-up. Additionally if a participant withdraws from routine data collection, data will only be collected up to the date of withdrawal.

NHS Digital will then create an extract of the requested data, removing any identifiers from the datasets except for the randomisation number. This will then be securely transmitted to UoL. Thus, there is a flow of identifiable data into NHS Digital but the flow out of NHS Digital will be pseudonymised, with the identifying data removed and replaced with the corresponding randomisation number.

This process will be done on a monthly basis for linkage to Civil Registrations of Deaths, Medicines Dispensed in Primary Care and SUS data. On an annual basis the cohort will be linked to the Hospital Episode Statistics datasets.

Data from NHS digital will be stored at UoLs on premise secure managed network drive and accessible only to those individuals permitted to access the data. Multiple storage locations are used as the servers are mirrored in case there is an issue in one of the buildings. Only approved members of UOL will have access to the NHS identifiers, Bangor University, as a data processor, will not have access to these identifiers.

UOL will provide the relevant data securely to Bangor University to enable them to conduct Health Economics analysis. The data will not contain NHS number and only the randomisation number will be used. Bangor University will not be able re-identify any participants. Files holding pseudonymised data will be held securely by Bangor University in folders with access restricted to approved Bangor University analysts.

ISARIC4C Coronavirus Clinical Information Network (COCIN) GPES record linkage — DARS-NIC-402963-P0Y5D

Opt outs honoured: No - Statutory exemption to flow confidential data without consent, No - data flow is not identifiable, No (Excuses: Statutory exemption to flow confidential data without consent)

Legal basis: Health and Social Care Act 2012 - s261 - 'Other dissemination of information', Health and Social Care Act 2012 - s261 - 'Other dissemination of information'; Other-(CV19: Regulation 3 (1) of the Health Service (Control of Patient Information) Regulations 2002), Health and Social Care Act 2012 - s261 - 'Other dissemination of information'; Other-Other(CV19: Regulation 3 (1) of the Health Service (Control of Patient Information) Regulations 2002), Other-(CV19: Regulation 3 (1) of the Health Service (Control of Patient Information) Regulations 2002), Other-Other(CV19: Regulation 3 (1) of the Health Service (Control of Patient Information) Regulations 2002), CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002, Health and Social Care Act 2012 s261(2)(a); Other-Regulation 3 (1) of the Health Service (Control of Patient Information) Regulations 2002, Health and Social Care Act 2012 s261(2)(a); Other-Para 8.3.2 of the Covid-19 Directions; Other-Regulation 3 (1) of the Health Service (Control of Patient Information) Regulations 2002

Purposes: No (Academic)

Sensitive: Non Sensitive, and Sensitive, and Non-Sensitive

When:DSA runs 2020-09 – 2023-09 2020.11 — 2023.02.

Access method: Ongoing, One-Off

Data-controller type: UNIVERSITY OF OXFORD

Sublicensing allowed: No

AGD/predecessor discussions: AGD minutes - 6th March 2025 final.pdf, IGARD Minutes - 23 September 2021 final.pdf, igardminutes24thseptember2020final.pdf, IGARD Minutes - 13 January 2022 final.pdf, igardminutes-22ndoctober2020final.pdf

Datasets:

Mental Health Services Data Set
COVID-19 Second Generation Surveillance System
Covid-19 UK Non-hospital Antigen Testing Results (pillar 2)
Secondary Uses Service Payment By Results Accident & Emergency
COVID-19 Hospitalization in England Surveillance System
Secondary Uses Service Payment By Results Outpatients
Secondary Uses Service Payment By Results Spells
Secondary Uses Service Payment By Results Episodes
NHS 111 Online Dataset
Emergency Care Data Set (ECDS)
Civil Registration - Deaths
GPES Data for Pandemic Planning and Research (COVID-19)
Hospital Episode Statistics Accident and Emergency
Improving Access to Psychological Therapies Data Set
National Diabetes Audit
Secondary Uses Service Payment By Results Accident & Emergency
Improving Access to Psychological Therapies Data Set_v1.5
COVID-19 Vaccination Adverse Reactions
COVID-19 Vaccination Status
HES-ID to MPS-ID HES Accident and Emergency
Hospital Episode Statistics Admitted Patient Care
Civil Registrations of Death
COVID-19 General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR)
COVID-19 Second Generation Surveillance System (SGSS)
COVID-19 UK Non-hospital Antigen Testing Results (Pillar 2)
Hospital Episode Statistics Accident and Emergency (HES A and E)
Improving Access to Psychological Therapies (IAPT) v1.5
Mental Health Services Data Set (MHSDS)
Hospital Episode Statistics Admitted Patient Care (HES APC)
COVID-19 SGSS First Positives (Second Generation Surveillance System)

Type of data: Identifiable, Anonymised - ICO Code Compliant

Objectives:

The Coronavirus Clinical Information Network (CO-CIN) has collected data for the International Severe Acute Respiratory Infection Consortium (ISARIC) Coronavirus Clinical Characterisation Consortium through a commission from the Chief Medical Officer to conduct Urgent Public Health Research to provide evidence that informs public health policy in response to the COVID-19 emergency.

ISARIC’s purpose is to prevent illness and deaths from infectious disease outbreaks. ISARIC is a global federation of clinical research networks, providing a proficient, coordinated and agile research response to outbreak-prone infectious diseases.

The ISARIC Coronavirus Clinical Characterisation Consortium is a UK-wide consortium of leading experts in outbreak medicine with a proficient, coordinated, and agile research response to COVID-19.

In 2019 a new virus, SARS coronavirus-2 (SARS-Cov-2) emerged. It seems highly likely that SARS-CoV-2 and its associated disease COVID-19 will cause mortality unprecedented in modern times. This is a new disease. There is a high chance that clinical trials will fail to detect therapeutic effects, by enrolling at the wrong time, or missing key subgroups or endpoints. Concurrent biological phenotyping can mitigate these risks, providing rapid, efficient clinical evidence.

CO-CIN response has been planned and tested over the past 8 years within the International Severe Acute Respiratory Infection Consortium (ISARIC).

CO-CIN informs the Department of Health and Social Care (DHSC) on a weekly basis about the clinical evolution of disease in the United Kingdom. To achieve this, clinical research nurses and administrators gather anonymised data from clinical notes and enter it into a simple online database. This allows the characterisation of the patients’ clinical features as well as risk factors associated with severity, risk of hospitalisation and death. The information gathered is essential to help health service planning and provision, and to rapidly evaluate the impact of interventions such as new therapeutics or vaccines.

The legal basis for the processing and storage of personal data for COCIN is that it is a task in the public interest Article 6(1)(e) it is in the public interest to conduct public health research to provide evidence to inform public health policy in response to the COVID-19 emergency and to understand and report on the risk factors associated COVID-19 and that sensitive personal data is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes Article 9(2)(j).

The research is conducted with relevant Health Research Authority ethical approvals throughout the UK. Since early February, CO-CIN has collected data on over 79,000 patients of all ages requiring admission to hospital with covid-19, and patients in hospital subsequently diagnosed with covid-19 in England, Scotland and Wales, accounting for approximately 60% of all patients admitted to hospital with covid-19 in the UK. Only data from England will be sent to NHS Digital for linkage.

Patients are recruited into one of three Tiers. Tier 0 sites are recruited for data collection only without consent, while Tiers 1 and 2 provide consent for sample collection in addition to data collection. The distinction of the study into three Tiers was made to allow for a resource appropriate implementation of the protocol, as it was understood that data and/or sample collection may be limited in some settings.

For Tier 0 patients clinical data is collected but no additional biological samples are obtained for research purposes. The minimum clinical data set summarises the illness episode and outcome, with the option to collect additional detailed clinical data at frequent intervals, according to local resources/needs.

Given the scale of the current COVID-19 pandemic, and because initially data collection for Tier 0 participants was clinical data only from which the participant could not be identified, consent was not sought. The data is collected by a health care professional who has access to the patient's information by virtue of their clinical role. The addition of collection of NHS number, Date of Birth (DOB) and postcode for Tier 0 participants means they are now able to be identified from the dataset in order to support linkage to other NHS data sources and is currently being done under Control of Patient Information Regulations (COPI). The identifiable data is not made available to researchers. Tier 0 is being retrospectively and prospectively completed with identifying data relying on COPI.

The datasets are required primarily to enable CO-CIN to report early and accurate findings to the Scientific Advisory Group for Emergencies (SAGE). Since the early growth phase of COVID-19 in the UK, CO-CIN has presented near real-time epidemiological descriptions and analyses of hospitalised patients with COVID-19. CO-CIN have presented analyses of patient factors including ethnicity, age, comorbidity, and their association with in-hospital mortality, enabling SAGE to make decisions based on near real-time evidence. SAGE will have no access to the NHS Digital data shared under this Agreement.

Each of the datasets, including the GDPPR data (which will provide data on shielded patients), is essential to support the analysis for the use cases as these questions cannot be answered using just the available CO-CIN data. Specific questions about shielding, pre-existing patient co-morbidities, and the outcomes for patients are important to be able to understanding the full impact of the disease and interventions. CO-CIN was set up as a pandemic Case Report Form (CRF), and comorbidity categories are broad. There is a need to understand the duration and severity of comorbidities in more granularity. CO-CIN need to be able to report and respond to the impact that multi-morbidity and frailty have on COVID. CO-CIN have detailed data regarding in-hospital sequelae of COVID-19, but require post-discharge follow-up data in. Follow up of contacts with primary and secondary care, and in particular cardio-respiratory and psychiatric sequelae will be imperative to understanding the long-term impact of severe COVID-19.

CO-CIN need to be able to report and respond to the longer-term impact COVID-19 is having on hospitalised survivors. CO-CIN have outcomes for patients at hospital discharge (alive, dead, palliative discharge, ongoing rehab). For the majority of patients this is <28 days from hospital admission. CO-CIN need to be able to report the longer-term all-cause and excess mortality for patients hospitalised with covid-19. The data will help to understand the longer-term mortality for patients admitted to hospital with COVID-19 (long-COVID).

CO-CIN have shown that diabetes is independently associated with in-hospital mortality for patients with COVID-19, and this partially mediates the relationship between ethnicity and mortality. The data requested is essential to understanding the relationship between diabetes, COVID-19 and mortality, by increased granularity of diabetes comorbidity including duration since diagnosis, complications, comorbidity, ethnicity and longer-term mortality

CO-CIN have presented data on hospital acquired infection, level of treatment (oxygen, critical care, invasive ventilation), and specific treatments (including dexamethasone, remdesivir, convalescent plasma). CO-CIN have developed a secure, password protected dashboard where SAGE members are able to access aggregated data with small numbers suppressed. This data is accurate to the same day.

The data has been used for modelling by Scientific Pandemic Influenza Group on Modelling (SPI-M), and is the compulsory national registry for patients who receive remdesivir.
CO-CIN have produced academic papers published in high impact journals (early general description, paediatrics, risk prediction model, ethnicity) which have supported evidence based practice in UK hospitals.

CO-CIN have supported external collaborations with specialty academic groups to explore their patient groups, such as those with interstitial lung disease and HIV.
In a subgroup of 2,500 patients, CO-CIN have linked with detailed biological and follow-up data.

Use cases for supporting SAGE reporting include the following research questions:
- What are the outcomes for patients on the shielding list?
- How do patient comorbidities/multi-morbidity contribute to in-hospital mortality in patients admitted to hospital with covid-19?
- What is the impact of ethnicity and socio-economic deprivation on outcomes in patients admitted to hospital with covid-19?
- Does access to hospital affect outcomes?
- What are the longer term sequelae for hospitalised survivors of covid-19?
- What is the longer term mortality in patients admitted to hospital with covid-19?
- What is the association between diabetes and in-hospital mortality?

The following organisations are involved in the study:
University of Oxford are the lead organisation for the study and host the data collection, they are the data controller.
University of Edinburgh are hosting the research databases and have the data science team that will be analysing the linkage, they are a data processor.
University of Liverpool support the hospital recruitment sites.
Imperial University are solely analysing the collected sample data and no NHS Digital data.
Only the University of Edinburgh will store or have access to the NHS data from England, within its data safe haven.

Expected Benefits:

Expected benefits within the project timescales to April 2021 will include:
Reduce deaths associated with COVID-19
Assist commissioners in making decisions to better support patients
Identifying COVID-19 trends and risks to public health
Enables Department of Health and NHS England to provide guidance and develop policies to respond to the outbreak
Controlling and helping to prevent the spread of the virus
The Department of Health and NHS England can share a common understanding of activity levels across the system in regard to COVID-19. Better activity data will also enable a more robust national planning process and improve the allocation of resources across the system.

This will support the response to the pandemic but also the recovery of services.

Outputs:

The primary outputs will include regular and ad-hoc reports to SAGE and the UK Government with only aggregated data and small number suppression.
These will be done on a regular basis for the duration of the project (October 2020 to April 2021).

The project will also produce submissions to peer reviewed journals, again with only aggregated data and small number suppression. It is expected that at least one submission will be made prior to the end of the project.

Any outputs to 3rd parties not included as a Data Controller/Processor in this agreement will be aggregated (with small number suppression applied in line with NHS Digital requirements).

Within 1 week of CO-CIN receiving the data from NHS Digital, it will be able to use the dataset to provide analysis for the described use cases that start to respond to the following:

- Support the NHS and government response to COVID-19
- Analyse the spread of patients hospitalised with covid-19 geographically and demographically, to identify any trends. Appointment activity will also be analysed to better understand use of non-face to face consultation trends and potential differences across geographical areas.
- Analyse potential hospital-acquired covid-19 geographically and demographically
- Diagnosing and monitoring the effects of COVID-19 at a national level.
- Ensuring the Department of Health and Social Care and NHS England has adequate data to inform that interventions and measures put in place to reduce the transmission of COVID-19 are being effective and impactful.
- Analyse factors that result in increased service utilisation for COVID-19 patients.
- Start building modelling and forecasting tools for COVID-19 from a linked Primary to Secondary pathway perspective to understand trajectories of care. Learning from and predicting likely patient pathways in order to influence early interventions and other alternatives for patients and develop new predictive modelling and tools for use by care professionals and commissioners.

The ISARIC website has recently been updated, and is under continuous review with regards to ensuring that it is up to date and relevant to inform the public of the programme. The ISARIC programme is also working with HDRUK to ensure that the use of data is made available to the general public as part of the overall national strategy and work with them to ensure that feedback from the public is heard. Senior researchers have also been active in promoting the study on social media and mainstream news outlets.

Processing:

Data will only be used for the purposes within this Data Sharing Agreement. Any additional disclosure / publication will require further approval from NHS Digital.

NHS Digital will be provided with a mapping file containing a list of NHS numbers of patients in England within the CO-CIN cohort, as well as the patient date of birth, postcode and the matched CO-CIN subject ID.

NHS digital will then create an extract , removing any identifiers from the datasets except for the subject ID. The linked pseudonymised data will then be sent securely to University of Edinburgh where it will be stored in the National Data Safe Haven (DSH) managed by the Edinburgh Parallel Computing Centre (EPCC) within an ISO27001 accredited data centre and processes. The EPCC manages its own data centre and services for the DSH.

There is a flow of identifiable data into NHS Digital and the flow out of NHS Digital will be pseudonymised, with the identifying data removed and replaced with a subject study id.

University of Oxford hold the CO-CIN database (the clinical data that is collected from the hospitals) and Oxford will flow the England data to NHS Digital, NHS Digital then send the pseudonymised extract to University of Edinburgh. University of Edinburgh hold all the pseudonymised CO-CIN data and will also hold the linked NHS D and CO-CIN data (England only) separately.

Identifiable data from CO-CIN and NHS Digital will never be stored within the same location and the linkage will be managed solely by NHS Digital.

PHS are named as a data processor due to their role supporting CO-CIN data flows and as the managers of the Serv-U gateway. They provide services to extract CO-CIN data but do not have any access to the DSH or any ability to access NHS Digital data.

Access to the data within the DSH is strictly controlled, and the linked, de-identified data will only be available to named data scientists within the University of Edinburgh and, where necessary, system administrators, all of whom are substantive employees of University of Edinburgh. All access will be logged. Access to the de-identified dataset will be controlled by the CO-CIN data manager and, where data is made available for research principles of anonymisation and minimisation will be applied in line with the HES analysis guide and ICO best practice.

Data cannot be extracted from the DSH without going through a managed process, including statistical disclosure control checks, to ensure that only anonymised data will be extracted for the purposes of reporting to SAGE and the Department of Health and Social Care (DHSC), or covid-19 research. No direct identifiers will be used in these analyses.

The data will be stored in a pseudonymised form. Only anonymised data with small numbers suppressed will be used for reporting. The data will not be onwardly shared and accessed by other external partners without approval from NHS Digital and appropriate data sharing agreements being established.

AUDIT:
All processing and use of data provided is auditable by NHS Digital in accordance with the Data Sharing Framework Contract and NHS Digital terms.
Under the Local Audit and Accountability Act 2014, section 35, the Secretary of State has power to audit all data that has flowed, including under COPI.
The DSH and EPCC processes ensure that all data flows and activity will be recorded and auditable in line with ISO27001.

Data Minimisation:

CO-CIN are conscious that, in accordance with GDPR, only data required to answer the research questions should be requested, and as such, CO-CIN have only selected the variables that are relevant. CO-CIN have not requested identifiable data including patient address, forename and surname. CO-CIN have not requested GPES data regarding declines, contraindications and other exceptions; or review and monitoring codes. The main priority of the work is to answer as yet unasked questions directed by SAGE, and as such it is difficult to be absolutely sure which variables may or may not be required to answer these questions.

A study looking at Emergency Department attendances at NHS hospitals by people with epilepsy in the SAFE Trial — DARS-NIC-150521-F2Q1V

Opt outs honoured: No - data flow is not identifiable, No (Excuses: Consent (Reasonable Expectation))

Legal basis: Health and Social Care Act 2012 – s261(2)(c), Health and Social Care Act 2012 s261(2)(c)

Purposes: No (Academic)

Sensitive: Non Sensitive, and Non-Sensitive

When:DSA runs 2018-08 – 2021-08 2018.10 — 2018.12.

Access method: One-Off

Data-controller type: UNIVERSITY OF LIVERPOOL

Sublicensing allowed: No

AGD/predecessor discussions: igard-minutes-2nd-august-2018-final.pdf, igard-minutes-30th-august-2018-final.pdf

Datasets:

Hospital Episode Statistics Accident and Emergency
Hospital Episode Statistics Accident and Emergency (HES A and E)

Type of data: Anonymised - ICO Code Compliant

Objectives:

Epilepsy is the recurring tendency to have unprovoked seizures. With a prevalence of ~1%, epilepsy is the second most common serious neurological disorder in the UK. As well as having potentially important life implications for patients and families, epilepsy also has important societal impacts. One is the cost of providing emergency care. In the UK, 20% of people with epilepsy (PWE) visit hospital Accident and Emergency Departments (A&E’s) each year for seizures. In England alone, there are around 100,000 visits to A&Es each year. The cost of this in 2015/16 was ~£70 million.

One reason costs are so high is because half of the PWE visiting A&Es are admitted to hospital; indeed, 85% of admissions for epilepsy occur on such an unplanned basis. Readmissions further drive costs up; ≥60% of PWE re-attend A&E within 12 months. This rate of return is higher than seen for other long-term conditions with episodic relapse, like asthma, and diabetes.

Seeking emergency care for epilepsy can be appropriate, important, and even life-saving. Evidence from projects, such as the research team’s recent UK-wide National Audits of Seizure Management in Hospitals, now show though that most persons attending A&E do not attend for such reasons. Instead, most have known, rather than new epilepsy and present with non-emergency states which do not require the full facilities of an A&E. One of the reasons driving this use it that patients and their family members frequently lack the confidence and knowledge to manage seizures by themselves.

The research team, based at University of Liverpool, has developed seizure first aid training for this part of the epilepsy population and has recently completed a pilot randomised trial of it, called the Seizure First Aid Training for Epilepsy, SAFE trial. The SAFE trial focused on the 60% of this group (and their informal carers) who make multiple attendances in a year and together account for ~90% of all A&E visits made for epilepsy. The trial compared receipt of the intervention to usual care alone. The trial was completed with NHS ethical approval and HRA approval; was sponsored by the University of Liverpool and publicly registered (ISRCTN13871327), and was funded by the National Institute of Health Research (Health Services and Delivery Research programme (Project Reference No:14/19/09).

A pilot randomized trial is not designed with the aim to prove the superiority of one treatment over another, but rather to try out aspects of the larger trial and address design uncertainties that exist (Whitehead et al. Clin Trials 2014; 38: 130–133). The size of a pilot trial is rarely adequate to conduct statistical hypothesis tests as would be the case for the main definitive trial. Despite this, pilot trials have an important role in health care and benefit the health and social care system since they help us to understand how best to complete a full trial so that it can be well positioned to generate the scientifically rigorous evidence required to inform care and maximize patient outcomes. Moreover, pilot trials can help society avoid wasting finite resources on trials that are unfeasible or poorly designed. To this end, major funding bodies such as the UK’s National Institute for Health Research (NIHR) and the Medical Research Council expect pilot evidence on the feasibility of a trial before large amounts of money are released for a large trial to be completed.

A pilot trial was necessary as a range of uncertainties existed as to how to conduct a definitive trial. Uncertainties pertinent to this request for data from the HES A&E system were:

• The absence of an initial estimate for the sort of effect the seizure first aid intervention had on the proposed primary outcome for a definitive trial – namely participants subsequent use of A&E – and lack of an estimate of the annual rate of A&E use in the control arm and its dispersion. Without this information it is difficult to know what sort of sample size would be required for a definitive trial to ensure it was adequately powered to detect an effect if one existed. Going ahead to main trial without this information would have risked recruiting too few or too many participants. If too few were recruited, the probability of finding a clinically relevant difference would have been low and therefore, the chance of providing an inconclusive result high. Conversely, if too many participants were recruited then resources would be wasted, more patients than necessary could be given a treatment which will later be proven to be inferior; or an effective treatment may be delayed from being identified.

• The second uncertainty concerned how best to measure/ capture information on a person’s A&E use in a trial. For example, one could ask participants to self-report on their use, but this presumes all participants are well enough to answer the question and can provide an accurate answer. Memory impairment and mood disturbance are common in epilepsy and may impair recall. Given these uncertainties, and since the HES system provides the only comprehensive record of a person’s use of all NHS A&Es across the country, HES A&E data is required for the participants in the trial relating to the 12 months before and after they entered the trial. All participants in the trial provided explicit consent for their HES A&E data to be obtained. Having access to their HES A&E data would allow the above noted uncertainties to be addressed in the following ways:

• By enabling a description of the use of A&E by participants in the two treatment groups before and after entering the trial in order to generate an initial estimate of any change that occurred in A&E use in the two treatment groups and determine the annual rate of ED visits in the control group and its dispersion parameter. All this information could then be factored into a sample size calculation for a future definitive RCT.

• Participants in the trial were asked to self-report on their use of A&E during the 12 months before and after entering the trial. Having HES A&E data for these participants for the same periods of reference would enable a comparison of patients self-reported use of A&E against objective data on their A&E use. This comparison would allow measurement of the extent of agreement between the two measurement approaches and inform discussions about how best to measure A&E use within a future definitive trial, and indeed any other similar trials.

A multi-centre, external, pilot randomised controlled trial (RCT) was conducted with PWE aged ≥16 years who visited the A&E of one of three NHS hospital trusts in the NW of England (namely; Aintree University Hospital, Royal Liverpool University Hospital, Wirral University Teaching Hospital), in the prior 12 months for epilepsy on ≥2 occasions who could independently complete questionnaires in English, along with one of their family members or friends who have an informal caring role.

Ostensibly eligible patients were identified and invited to participate in the trial by their NHS A&E consultant who sent them an invitation letter in the post, along with a Participant Information Sheet. Persons who were interested in taking part were in turn contacted by a GCP-qualified, postdoctoral study researcher who confirmed patient eligibility, provided information and answered any questions the patient had. The researcher also provided the patient with a further copy of the Participant Information Sheet. Participants had a minimum of 24 hours to decide whether they wanted to take part or not.

As part of the consent process, 58 participants were recruited between May and December 2016. Participants provided informed written consent to participate and for the research team to access identifiable data from the HES A&E system on the number of times they had visited an NHS A&E in the 12 months prior to entry into the trial and then during the time period they were enrolled in the trial. Data on participants’ use of A&E before coming into the trial is required to permit adjustment for potential differences in baseline use of A&E between the two trial arms (i.e. those who took the training and those who did not) .

Participants taking part were put into one of two groups at random by a computer. The first group is called Group A and the second Group B. People who are put in Group A get the Seizure First Aid Training course (treatment) straightaway and people in Group B continue to receive their normal medical care (treatment as usual (TAU). The health of the people in the two groups will be compared to see if the Seizure First Aid Training was helpful or not. After the two groups’ health has been compared, people in Group B then get to go on a Seizure First Aid Training course if they want it.

Over the course of the trial, patient participants were followed-up and required to each complete three sets of questionnaires, either in a face-to-face interview with a research worker (at baseline and at 12-month follow-up) or through the post (at 6-month follow-up). At these assessment points, participants were asked to self-report on the number of times they had visited any NHS A&E. At baseline they reported upon A&E use in the previous 12 months. At follow-up they reported on A&E use since their prior assessment.

Prior to each assessment point, participants were contacted and asked whether they wanted to continue to participate in the trial or whether they wanted to withdraw their consent. During the course of the trial 5 patient participants formally withdrew and so HES A&E data will not be requested for them.

The size of the project’s sample size does not impact on the ability of the project to achieve its aims. Sample sizes between 24 and 50 have been recommended as ‘adequate’ for pilot trials (e.g., Sim & Lewis, J Clin Epidemiol 2012 65: 301-8; Julious, Pharml Stat 2005 4: 287-91).

Expected Benefits:

Whilst the trial will not be statistically powered to detect a clinically meaningful difference in outcome between treatment groups, summary statistics will be conducted to measure the effect of the intervention on the proposed primary and secondary outcome measures (outlined previously) and the precision of such estimates at the post-treatment time points. As such, it will directly inform the methodology employed in the data collection and analysis of a possible future definitive trial. This benefits health and social care by informing the methodology to be employed in data collection and analysis of a definitive trial, therefore strengthening the evidence base which underpins the data and the results. This output will be measurable based on the methods subsequently employed in the definitive trial.

Indirect benefits to health and social care will be achieved through output via presentation and publication to the research community involved in clinical trials and trial methodology. The output of this trial aims to inform the implementation of data from electronic HES records in a prospective definitive trial. Resultantly, RCT’s will use electronic medical records where a benefit is offered over self-report methods of data collection. This will result in improved efficiency of RCT’s, frequently funded through public sources and improved participant experience.

Benefits include having significant implications for the lives of patients and reducing unnecessary emergency admissions is a key factor in helping to relieve financial pressure on healthcare services.

Another major social issue is the indirect cost of epilepsy due to lost employment. The health and social costs could be reduced, and quality of life improved via better outpatient management. However, around 40% of those diagnosed have poorly-controlled epilepsy and continue to have two or more seizures per year, despite antiepileptic drug treatment. These findings highlight missed opportunities for epilepsy self-management. Guidelines are clear that, with the correct training, such seizures can be safely managed by patients and their families within the community. Evidence indicates that people with epilepsy that frequently visit the A&E might benefit from a self-management intervention that improves their own and their informal carers’ confidence and ability in managing seizures and empowers them to be able to tell others from their wider support network about first aid.

It follows therefore, that indirectly the assessment of data from the electronic HES records in a definitive randomised controlled trial to evaluate the effectiveness of seizure first aid training intervention for people with epilepsy will provide the best possible information in relation to A&E attendance by people following a seizure. This data will subsequently be used to inform service commissioning decisions. Commissioners planning health and social care services need good information about the experience of people with epilepsy and the intervention(s) they receive, as well as the result of that intervention as a means to ensuring that people are getting the services that are right for them. Reducing unnecessary emergency visits to hospital by people with epilepsy is identified as one way that resource limited health services can generate savings. In addition, reducing emergency visits is also important for service users; not least because emergency department visits can be inconvenient, distressing and do not typically lead to extra support.

Outputs:

The data provided by NHS Digital will directly inform the outputs of the NIHR (HS&DR) funded project and allow the research team to achieve several of the key objectives.

First and foremost, the data will allow an estimate of the effect of the training intervention. This will help understanding around whether the seizure first aid intervention developed is likely beneficial in reducing ED use.

Secondly, the data provided by NHS Digital will enable understanding of how data on A&E use captured by the HES system compares to patient self-report. The results from these analyses will provide knowledge on how to conduct the definitive trial if it is deemed appropriate.

Findings will be published in the form of a publicly available report to the NIHR and within a peer-reviewed publication. In no publication will the identify of participants be identified.

Final results from this trial and the associated outputs are expected by December 2018 in line with the project’s completion date. All presented and published findings will be anonymised and compliant with NHS Digital's operating procedures in relation to presentation and publication. Non-identifiable aggregate data will be used in presentations and publications. Outputs will consist of descriptive statistics and statistical measures of agreement between data retrieved from the HES A&E system and participants’ self-report.

The HES Analysis Guide rules will be complied with. Only the mean/ median number of A&E visits by participants in the two groups and the dispersion parameters shall be described. The change in A&E use between the two groups will be reported and compared. In no instance shall data where cell counts are less than 5 as specified in section 5.1 of the HES Analysis Guide be published. Moreover, the maximum number of visits by an individual shall not be reported since this is information that can relate to an individual.

The primary focus here is to ascertain the agreement and additional benefits of data from the HES A&E data set and not to report on specific clinical criteria. It is not the intention to present record level data in any report. In all outputs data will be de-identified and all measures taken to ensure that individuals cannot be identified. For example, information regarding geographic location, timing and gender will be omitted as well as explicit clinical / personal details. Furthermore, both epilepsy and the clinical event of A&E attendance(s) under assessment are very common.

This project will have both direct and indirect outputs.

The pilot trial will include the analysis of individual participants’ use of A&E at baseline and over the 12 months of follow-up. This will assist in determining whether incorporating A&E attendance from electronic medical records in place of patient self-report records provides a more rigorous data set. This information will be included in the analysis on completion of the pilot trial. This output will take the form of reports and presentations.

This project’s findings will also inform the wider epilepsy research community contributing to the development and improvement of efficient future trial design. In particular, information from HES data will provide evidence on how accurate people with epilepsy are at self-reporting on previous emergency department use.

Using HES data from the electronic system to provide the primary outcome data would extend the timescale of a future trial and increase costs. At present, no evidence exists therefore, to be able to inform a future trial about how best to measure A&E use, the coverage and accuracy of patient self-report will be compared to data from the HES system. This will help determine whether the expense associated with use of the HES system as the primary means of measuring A&E use is warranted.

These outputs will be disseminated to clinicians and academics involved in the conduct of clinical trials and research, concerning clinical trials methodology, within the epilepsy population. Dissemination of findings will be presented at academic conferences (potentially; 13th European Congress on Epileptology http://epilepsyvienna2018.org/scientific-programme/) and in peer-reviewed journals (potentially; Journal of Neurology, BMJ Open, Epilepsy and Behavior) and will take the form of a narrative assessment of:

- the methods and feasibility of access to electronic medical records

- the agreement and reliability of data from routine sources

- the benefit / limitations of data from electronic medical records

Processing:

Data management and analysis is to be conducted within the Clinical Trials Research Centre (CTRC) at the University of Liverpool. The specific methodological activities involved in the processing of data are as follows:

The NHS Digital HES A&E data will be requested for all trial participants who provided consent and who have not subsequently formally withdrawn from the trial. A request for data will be made for all applicable participants on one occasion only.

The time period for getting data on participants’ use of A&E relates to the 12 months prior to them entering the trial and the 12 months following their enrolment. The research team will send NHS number, the unique (pseudonymised) study ID, and the date of recruitment to the study to NHS Digital.

Within the data file sent back to the University of Liverpool from NHS Digital, the University of Liverpool need to know the study ID associated with each individual’s A&E visit that occurred within the time period of interest by the University of Liverpool’s list of patients. This is necessary so data from the HES A&E system can be allocated to the correct participants in the trial and to enable a comparison of the use of A&E of participants in the two trial arms. In order to marry the detail provided from participants the date of A&E visit is required from NHS Digital to ensure accurate matching.

The resulting file generated by NHS Digital would include any A&E visits by patients who participated in the trial captured by the HES system that occurred within the relevant time. The data provided would include the date of the visit. The data file would be securely transferred back to the CTRC at the University of Liverpool, again using NHS Digital’s Secure Electronic File Transfer SEFT system.

Analyses will not be completed using the identifiable data set and to ensure as few can access this file a structured process, used previously by the research group when using HES data will be followed. Specifically, having received the data file from NHS Digital the trial’s postdoctoral research fellow will, within the confines of CTRC, work with the identifiable dataset to create a pseudonymised version. To do this, the research fellow will attribute all A&E visits captured by the HES system to the appropriate participants in the trial. The resulting file will only contain participants Unique Study Number and the number of A&E visits that they made during the relevant time periods. The patients NHS number will not be included in this file. The data will then be linked, using the patients Unique Study Number, with the main trial database and the self-report data provided by participants in the trial. Following this process, the data set will then be accessible to the study team members involved in the analysis, including the Clinical Investigator and the trial statistician.

The primary outcome by which treatment effect will be estimated is defined as the number of epilepsy-related A&E visits made by patient participants over the 12 months following randomisation measured by HES data. The number of A&E visits at the end of the 12-month follow-up period will be presented, in addition to the change in the number of A&E visits at the end of the 12 months compared to the number of A&E visits in the 12 months prior to baseline. Results will be presented as mean and standard deviations if data are normally distributed and median, IQR and range if data are skewed. Results will be presented overall and by treatment group.

The difference between the treatment group compared to the TAU control group will be expressed as a mean difference and 95% confidence interval and statistically tested according to a 5% level of significance by an independent t-test if data is normally distributed. In addition to aid interpretation, 90% and 80% confidence intervals of the mean difference will be reported. The difference between the groups will be tested with a Mann-Whitney U test if the data are skewed.

To maintain the original blinding of the trial statistician during the SAFE trial, anonymised HES data without any details of intervention will first be analysed and overall results presented. Subsequently, intervention allocations of individuals within the HES data will be made available to the trial statistician and results will be presented by treatment group and statistical testing will be performed.

For the secondary objective, self-reported numerical results will be compared to those of the results of epilepsy-related A&E visits calculated from HES data. If possible, Bland-Altman agreement statistics will also be calculated to determine the agreement of the two measurement methods of recording A&E visits. The term “if possible” is used not because the sample size might preclude completion of these analyse, but rather to account for the fact that without having seen the HES A&E data and there was the possibility, albeit unlikely, that the researchers could not calculate these statistics because the HES data received was is in a different format to the self-reported data and so the two cannot be directly compared.

All data received will be stored using the University of Liverpool Research Data Management Datastore (https://www.liverpool.ac.uk/csd/records-management/storage-and-disposal/). Data will be stored electronically on University of Liverpool central servers, located in an access-controlled server room and connected to the main University network, located behind a firewall. Physical access is limited to Computer Services Department staff. Data will be encrypted using industry standard techniques meeting the Information Governance Toolkit standard (8HN20). The data will not be transferred to an additional location. The SAFE trial CI will act as data custodian (https://www.liverpool.ac.uk/library/research-data-management/storing-your-research-data/). The University of Liverpool ‘Information Security Policy’ and ‘Research Data Management Policy’ provide further information.

The pseudonymised dataset will be accessed by specific members of the SAFE trial research team based in the University of Liverpool. All outputs will contain only aggregate with small numbers suppressed in line with the HES analysis guide.

All data will be stored and accessed at the University of Liverpool at all times. All personal data in this trial is kept strictly confidential and is being handled, stored and destroyed in accordance with GDPR.

Only individuals substantively employed by the University of Liverpool and are part of this study will have access to the data. No other collaborator of this study will have access to the data received from NHS Digital.

All organisations party to this Agreement must comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by “Personnel” (as defined within the Data Sharing Framework Contract - i.e. employees, agents and contractors of the Data Recipient who may have access to that data).

Project 10 — DARS-NIC-14337-J4N1T

Opt outs honoured: N

Legal basis: Health and Social Care Act 2012

Purposes: ()

Sensitive: Non Sensitive

When:2017.06 — 2017.05.

Access method: Ongoing

Data-controller type:

Sublicensing allowed:

Datasets:

Hospital Episode Statistics Admitted Patient Care

Type of data:

Objectives:

The British Orthopaedic Surgery Surveillance (BOSS) study is a mechanism for researching the treatment of rare orthopaedic diseases within the UK. The methodology detailed in the study protocol is in keeping with similar successful studies of rare diseases performed in Obstetrics and Gynecology (UK Obstetric Surveillance System (UKOSS), BAPS-CASS (British Association of Paediatric Surgeons – Congenital Anomaly Surveillance Study) and BPSU (British Paediatric Surveillance Unit). In these studies, routine data (i.e. disease and anomaly registers) is often used to verify the completeness of case ascertainment – though this is the first time that HES has been used to attempt to augment case identification.

The diseases of interest within the BOSS Study are Slipped Capital Femoral Epiphysis (SCFE), and Perthes’ disease. Both are rare hip diseases of adolescence. SCFE is always admitted to hospital for surgery, therefore the data captured within HES is likely to be good. Perthes’ disease is only occasionally admitted; therefore HES data is unlikely to be useful to identify cases of new disease. The data processing and analysis hereafter relates solely to SCFE.

All orthopaedic units treating children within the UK are being asked to submit data to the service evaluation/ audit (supported by Orthopaedic Specialist Societies, NICE and the National Clinical Director for Children). As of February 2016, over 150 UK hospitals have agreed to supply data to the study. Details of any new case of SCFE will be recorded by clinicians prospectively using the secure online REDCap clinical trials platform. All English hospitals have a nominated representative and University of Liverpool have made separate applications for the Scottish and Welsh data to the respective organisations.

This data will be managed by the Liverpool clinical trials unit, who are overseeing the delivery of the BOSS Study. HRA have given nationwide permission for sites to collect this data, without any additional local approvals and without patient consent (as data is anonymised and data forms part of routine care).

The care offered to children affected by the diseases of interest varies considerably around the UK, and beyond. These variations exist owing to the beliefs by the surgeon to which treatment is best and the experience and the skill that the local surgical team can provide. By adequately documenting variations in disease and surgical practice and recording the outcomes in current care, only then can one begin to influence change to improve care across the UK.

HES data will be used to ensure maximum case ascertainment is achieved within the study, to ensure the generalisability of the results. When a case of disease is identified within HES (by ICD code) during the study period, the BOSS team in Liverpool will be notified. HSCIC will share with the BOSS team the age and gender of patient, date of admission, date of surgery and hospital. No unique identifiers will be captured; therefore the patient will not be identifiable except to the treating clinicians. The data supplied by HES will then be used to check completeness of the REDCap database (i.e. that supplied prospectively by clinicians) against the HES record. In the event that a case is identified within HES, but has not been reported through RedCap, the nominated surgeon-lead at the relevant hospital will be contacted to ask them to determine the validity of the diagnosis, and if appropriate, formally report the details of the case through REDCap.

If the BOSS surveillance mechanism is successful, it will be expanded to other diseases. The intention is that the data generated may serve as stand-alone service improvement, and may generate feasibility research for future clinical trials of treatment interventions.

Expected Benefits:

The information that will be collected from this, the British Orthopaedic Surgery Surveillance (BOSS) study, has been planned in conjunction with patients, their parents and treating clinicians. The formation of a prospective SCFE database was a recommendation by NICE in their recent review of SCFE – the BOSS study is therefore meeting this demand.

The information gained from the BOSS study will be the largest study undertaken into the rare disease of SCFE, and will yield information concerning the effects of different interventions on patient outcomes. This will have direct implications for the way that surgical care is delivered, with surgeons being able to benchmark their practice against others. The findings of the BOSS study will inform the feasibility of a clinical trial into interventions for SCFE (i.e. are there enough cases, enough surgeon engagement and enough variation/uncertainty in practice to warrant a trial?). A clinical trial would be the gold-standard means to ensure evidence based care is being delivered to patients. However, if a clinical trial is not feasible, the BOSS study will significantly enrich the current evidence to enhance patient care.

The BOSS study group has biannual presentations; at the British Society of Children’s Orthopaedic Surgery (BSCOS) and at a national BOSS Collaborator meeting. BSCOS and the British Orthopaedic Association (BOA) have advocated that their members engage with the BOSS study. The results should therefore have direct, relevant and measurable impact on the clinicians involved. The BOSS study group have worked with members of the SCFE NICE review group to ensure that the findings of the study will address questions raised within the recent NICE review. The BOSS study will therefore have a direct effect on surgical practice and a positive impact on the care of patients.

The BOSS study will begin recruitment during 2016 (with HSCIC to aid case identification), will collect outcomes at 2 years (from 2018) and will therefore report in early 2020.

Outputs:

(1) To determine the case mix of SCFE across the UK, the variation in surgical practice and clinical and radiographic outcomes up to 2-years. There will be a published report which will be sent to Trusts and Clinicians on an annual basis during the study and the final report will published no later than one-year after completion of the study.
(2) The results will inform the feasibility of a clinical trial into the surgical treatments for SCFE, and will inform NICE related to the guideline surgery for SCFE. This final report will be published in peer-reviewed journals (e.g. the British Medical Journal) in line with NIHR expectations.
(3) Publication of the protocol will be undertaken during 2016, publication of the case mix and variations in surgical practice will published in late 2017. Publication of 2-year follow-up will not be possible until 2019.

NIHR funding for this study is for 5 years, so the project will be completed by 2020. If a clinical trial were to ensue, this would require sufficient case numbers, surgeon engagement, patient engagement and a well-balanced trial question. The BOSS Study is a good cost-effective mechanism by which to ‘test’ surgeon engagement, case numbers and begin to understand the variation in practice. This project will therefore inform the feasibility of all aspects of a trial.

Processing:

Data will be used as a reference from which to determine the completeness of cases reported by clinicians and prompt additional reporting.

Data will be processed by a team within the clinical trials unit at the University of Liverpool.

The BOSS study team will cross-check the HES identified case against the list of cases already reported to them by clinicians through the REDCap clinical trials platform. Cases will be identified only by age/sex of patient, date of admission, date of surgery and hospital. The team will make the assumption that no hospital will have more than one admission per day sharing the same details (this is a rare disease and even in larger children’s hospitals more than one case per day is unusual). If a case is not reported to them, the team will contact the clinician participating in the British Orthopaedic Surgery Surveillance (BOSS) study (all hospitals have identified such an individual with the support of the British Orthopaedic Association and the British Children’s Orthopaedic Association). The team will then ask the clinician to verify/refute the diagnosis. If the diagnosis is verified they will ask the clinician to submit anonymous details of the case through the REDCap clinical trials platform (this is in keeping with the successful UKOSS/ British Association of Pediatric Surgeons-Congenital Anomaly Surveillance System (BAPS-CASS) reporting systems). The BOSS Study has been granted nationwide NHS research approval (HRA-cohort 3), and national ethics approval.

Project 11 — DARS-NIC-311179-R5V5Y

Opt outs honoured: N

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC

Purposes: ()

Sensitive: Non Sensitive

When:2016.04 — 2016.08.

Access method: One-Off

Data-controller type:

Sublicensing allowed:

Datasets:

Hospital Episode Statistics Accident and Emergency
Hospital Episode Statistics Admitted Patient Care
Hospital Episode Statistics Outpatients

Type of data:

Objectives:

Over the years, the Roy Castle Lung Cancer Research Programme (RCLCRP) has been at the forefront of ground breaking research in early detection of lung cancer. Lung cancer is the leading cause of cancer-related death in most developed countries, with mortality rates exceeding that of colon, breast and prostate cancer combined (Jemal et al, 2010; Siegel et al, 2011). Given that more than 94% of the patients diagnosed with lung cancer in the UK die of the disease within five years, the primary objective is to detect lung cancer at an earlier, potentially more curable stage (5-year survival rate of stage IA tumour is ~70%). Lung cancer is predominantly a disease of the elderly, with an average age at diagnosis of around 60-70 years, and often presented very late at an advanced stage (Alberg et al, 2007; Dela Cruz et al, 2011).
Although the pathogenesis of lung cancer is not yet fully understood, researchers have suggested the potential role of the occurrence of concomitant diseases in the aetiology of lung cancer. Due to increasing longevity and rapid ageing populations, the number of people with more than one comorbid conditions is expected to increase sharply in the coming decades (van den Akker et al, 1998; Yancik et al, 2001). This increase might lead to an increase in the incidence of lung cancer and the comorbidity burden might lead to increase overall and/or lung cancer-specific mortality. To this end, a documentation of previous history of diseases is essential for exploring the impact of comorbidity on lung cancer. A rich source of data for exploring the potential role of comorbidity in lung cancer pathogenesis is the Hospital Episode Statistics (HES).
The Liverpool Lung Project (LLP) intend to link details of all admissions, outpatient appointments and accident and emergency (A&E) attendances of all participants in the LLP at NHS hospitals in England to the epidemiology data gathered through detailed questionnaire for all LLP patients. In addition, all information gathered will be linked to the ONS data to study the mortality patterns of all participants in the LLP. The in-house database system will be used to collate all data and the output of the analysis will be documented in scientific literature.

Expected Benefits:

Past Benefits
1. Report for RCLCF
This report is a requirement of the funding body to ascertain that adequate outputs are produced from their financial contribution and be accountable to the general public and its trustees as to exactly how and for what purpose voluntary public funding is being utilised in the name of Lung Cancer Research. It is imperative that the RCLCF is satisfied that the RCLCRP is constantly finding and utilising all information available to it to further develop and prove the accuracy of the LLP Risk Model when used as an early detection tool for Lung Cancer. Risk Prediction models incorporating multiple risk factors have been recognised as a method of identifying individuals at high risk of developing lung cancer. Thus accurate selection of high-risk individuals for lung cancer screening requires robust methods for prediction. The LLP has produced a risk model that has been utilised for identifying high risk individuals for screening in the first UK lung screening programme. As early diagnosis can save lives, the LLP have developed a new generation of risk model, the LLPi that may assist in identifying individuals at high risk of developing lung cancer using hospital episodes as surrogates of disease history. Unlike most risk models that are based on (biased) questionnaire data, the LLPi took advantage of the available hospital episode statistics data to corroborate questionnaire data for disease status. This resulted in the LLPi with a good calibration and c-statistic of 0.85 – one of the highest in lung cancer risk modelling.

The Cancer Registry, HES and ONS data being made available to the RCLCRP is a fundamental aspect in aiding the development and testing of the LLP and LLPi Risk Model and developing new biomarkers for disease detection and management. Such data also allows further co-morbidity links and contributory factors to be investigated, analysed and reported on and thus enables patient recruitment strategy development for future research and further funding to be sought.
The findings of the RCLCRP enables it’s funder, the RCLCF, to develop informed policy, target fundraising and influence UK Public Health by generating information and publicity campaigns to raise awareness of lung cancer. These individuals and public health professionals can make use of this information to take decisions and advise on lifestyle choices relating to an individual’s risk of developing lung cancer particularly if they have known pre-existing conditions or persist with current lifestyle.

2. Publications
Publication is fundamental to the provision of evidence-based medicine and the delivery of an effective healthcare system. For example, these publications benefit the wider community of clinicians both when investigating the possible presence of Lung Cancer within patients and potential risk of Lung Cancer developing where certain health risk pre-cursors are present in their history, whether these be actual co-morbidity diseases already present or socio-demographic health patterns. They also demonstrate how the Risk Model could be used as a tool within public health organisations for preventative/deterrent purposes when patients are advised by their clinicians to make improvements to their health or change their habits to potentially improve life expectancy.

The publications can also contribute to scientific knowledge on development and validation of biomarkers used to detect or differentiate lung cancer, for example: methods for detection lung cancer; characterisation of molecular changes in cancer cells; the nature of DNA mutation and methylation as a hallmark of different lung cancer sub-types. These results are put into clinically relevant context by the data we obtain relating to other diseases (HES), the incidence of cancer amongst previously healthy recruits (cancer registry) and outcome (ONS).
Our publications also highlight the need to develop drugs to improve life expectancy timelines when Lung Cancer is detected. Delineation of different molecular classes of lung cancer is contributing significantly to changes in medical practice, leading to new targeted therapies.

3. Report for EU FP7 Funded Projects (LCAOS & CURELUNG)
The detailed reports of the findings and impact of research to which the RCLCRP contributed are seen by EU Commissioners and Scientific Committees to inform on the effectiveness and impact of the work, appropriate utilisation of funding and further progress required to develop for implementation of the findings. These help to guide future policy decisions on research goals and investment (including the structure of the current funding scheme, Horizon 2020).
Future Benefits
The establishment of the LLP case/control cohort has provided an important resources that is internationally recognised and will continue to provide benefits in the future. The ongoing update of associated medical data will further enhance the utility of this research resource, for example:
1. The molecular biomarker group within the RCLCRP aim to utilise bronchial washings and/or sputum and/or blood to develop molecular assays for early diagnosis of lung cancer. The integration of HES, Cancer Registry and ONS data with the molecular data will allow us to improve the LLP Risk Model tool for application of personalised risk assessment alongside the development of future molecular assays (either targeting those at highest risk, or attenuating the results to account for known confounding factors).

2. Characterisation of risk factors for lung cancer is of considerable health and economic importance, as they can be used to inform prevention, screening and treatment policy. The group will continue to develop the Liverpool Lung Project (LLP) risk model for lung cancer and is identifying epigenetic and genetic biomarkers for early detection and prognosis of lung cancer.

3. The United Kingdom Lung Cancer Screening Trial (UKLS) utilising the LLP risk model will be used to help refine and improve risk assessment tools, providing more efficient targeting of screening populations and other interventions. The RCLCRP and associated HSCIC data provides an important corollary to the CT screening setting and an opportunity to publish comparative studies to inform the direction of lung cancer early detection.

4. Application of HSCIC data to clinical problems provides an opportunity for training and education of the next generation of scientists and medics. Anonymised HES data will be utilised for the academic training of future PhD students and other affiliated research scientists. This will promote innovation, exploit new technologies and produce world-class scientists that will contribute to the continued development of life science research, which provides an important economic driver and improves healthcare.

5. Specific benefits to the Health and Social Care system include: the use of molecular-epidemiological risk assessments prior to clinical diagnosis and markers of pre-clinical carcinogenesis in patients with a high risk of developing lung cancer will reduce the incidence of clinically detectable lung cancer, given the appropriate intervention strategies. Early detection research provides the most cost-effective strategy for improved mortality, as treatment at an earlier stage not only provides better patient outcome, but is cheaper in the long term.

Outputs:

Past Outputs

1. Annual Report(s) for funding body, e.g. The Roy Castle Lung Cancer Foundation (RCLCF) to identify type of research undertaken, recruitment statistics and specific research developments within the funding period. This report is seen by the RCLCF Executive and Scientific Committee and its Trustees to inform policy and quantify the benefit of future funding of the research programme.

2. The RCLCRP and its collaborators have produced many peer-reviewed publications in a selection of high-ranking journals As an example, publications during 2013 & 2014 that made use of ONS, MCCR or HES data included:
• Contribution to a study that examined over 500 lung tumours for DNA methylation and demonstrated a prognostic DNA methylation signature for stage I Non-Small Cell Lung Cancer (NSCLC) (Sandoval et al., Journal of Clinical Oncology 2013 [47])
• Discovery and validated a microRNA expression signature that identifies NSCLC (Bediaga et al., British Journal of Cancer 2013 [48]).
• Examination of the molecular genetic profile of carcinoid cancers, implicating chromatin-remodelling genes (Fernandez-Cuesta et al., Nature Communications 2013 [49]).
• Aiding the definition of a genomics-based classification of human lung tumours (Clinical Lung Cancer Genome Project & Network Genomic Medicine, Science Translational Medicine, 2013 [50]).
• LLP Biobank samples helped identify a new tumour suppressor gene for lung cancer (Gkirtzimanaki et al., Proceedings of the National Academy of Science USA 2013 [51]).
• The importance of risk prediction models to lung cancer screening has been highlighted (Field et al., 2013 in Lancet, Lancet Oncology and Journal of Surgical Oncology [48, 52, 53]).
• We have investigated factors associated with dropout in a 5-year follow-up of individuals at high risk of lung cancer in the LLP follow-up cohort (Marcus et al., International Journal of Oncology, 2013 [54]) and looked at the impact of co-morbidity on lung cancer mortality (Marcus et al., Oncology Letters, 2013 [55]).
• Genome Wide Association Studies (GWAS) and epidemiology continue to provide a useful insight into lung cancer susceptibility (TRICL, ILCCO & SYNERGY publications):
- Lung cancer risk among different professions (Behrens, Occupational and Environmental Medicine, 2013 [56]; Consonni et al., International Journal of Cancer 2014 [57]).
- New methods for smoking assessment in lung cancer risk (Vlaanderen et al., American Journal of Epidemiology 2014 [58]).
- A pooled analysis of case-control studies conducted between 1985 and 2010 (Olsson et al., Am J Epidemiol 2013 [59]).
- SYNERGY – Welding and Lung Cancer in a Pooled Analysis of Case-Control Studies (Kendzia et al., Am J Epidemiol 2013 [60])
- Analysis of the relationship between second hand tobacco smoke and lung cancer histology (Kim et al., International Journal of Cancer 2014 [61]).
- Associations of risk variants for other cancers with lung cancer risk (Park et al., Journal of the National Cancer Institute 2014 [62]).
• Two publications have utilised anonymous HES data:
- Marcus MW, Chen Y, Duffy SW, Field JK. Impact of comorbidity on lung cancer mortality - a report from the Liverpool Lung Project. Oncol Lett. 2015 Apr;9(4):1902-1906.
- Marcus MW, Chen Y, Raji OY, Duffy SW, Field JK. LLPi: Liverpool Lung Project Risk Prediction Model for Lung Cancer Incidence. Cancer Prev Res (Phila) 2015 Jun;8:570-5.

N.B. References of Publications listed above can be found in SD11 – Publication References.

Much of this work has also been presented at major cancer conferences (e.g. NCRI Annual UK Meeting, American Association of Cancer Research Annual Meeting and World Lung Cancer Conference).

3. Report on the LCAOS & CURELUNG Projects (EU FP7 Collaborations): LCAOS - development of a Breath Test for the early detection of Lung Cancer; CURELUNG – the (epi)genetics of lung cancer. The selection process used to identify the cohorts for these studies included a knowledge of their health status, Access to HSCIC data for these individuals provided important information of their cancer and respiratory disease history which was utilised when at the sample analysis stage. For example in LCAOS, HES information for a particular patient whose lung capacity levels were low at the time of the sample being taken and enduring breathing difficulties may be shown some 12 months later to have had hospital episode which diagnosed a lung disease which may have been present at the time of the sample being taken. For CURELUNG, respiratory disease status informed risk-stratification analysis; outcome data was used to investigate the possibility of treatment stratification based on DNA methylation.

4. The RCLCRP has established, through the Liverpool Lung Project, one of the largest prospective lung cancer case-control and cohort population in Europe (>11,500 participants) with epidemiological, clinical & outcome data and specimens incorporated into the LLP Biobank. This is a resource that has been and will continue to be utilised for a wide variety of research projects, generating additional investment and providing opportunities for exploitation of results in the form of risk prediction models, biomarkers for cancer detection, characterisation of lung disease and identification of targets for treatment.

5. The RCLCRP was instrumental in initiation of the United Kingdom Lung Cancer Screening Trial (UKLS) utilising the LLP risk model. Professor Field is the clinical investigator of the UKLS and the trial was run from the University of Liverpool Cancer Trial Unit.

Future Outputs
1. Reports: Further reports for grant awarding bodies will be produced. This will include reports in support of additional funding applications for further analysis, ensuring maximum utility and benefit from the data provided.
2. Publications: It is anticipated that the analysis from this study will be included in internationally renowned oncology, epidemiology and public health journals (in keeping with our proven publication record, above). Publications will be prepared for 2015, 2016, and 2017.
3. Presentations: In accordance with previous years it is expected that presentations will be given at major cancer conferences. These presentations will provide dissemination of results from ongoing studies of LLP Risk Modelling, Methylation, MicroRNA, Sequencing, etc.

Nature of Outputs
The LLP project provides detailed clinical outcomes together with the patient’s epidemiological questionnaires, complemented by the excellent HES data; in depth molecular-epidemiological LLP investigations into molecular biomarker groups and DNA sequencing projects.
The majority of outputs will contain aggregate data only; very occasionally individual level data will be presented (e.g. patient characteristics for tumour samples analysed), but these will be coded and completely anonymised to prevent identification. No HSCIC linked record level data will be shared directly with commercial companies or third party organisations or included in directly in any outputs. In some instances the data will consist of anonymised, characteristic data linked to a sample shared for research purposes; e.g. it may state that “the sample was from a patient of 60 years old with a diagnosis of COPD present for 10 years who was diagnosed with lung cancer at 65 years and died of heart failure aged 70 and the patient had been hospitalised for COPD on 6 occasions”.
All outputs are research outputs, not commercial, although some research is undertaken within a commercial environment (e.g. pharmaceutical or life-science/biomarker companies).

Processing:

All data processing of the original HSCIC dataset will take place at The University of Liverpool and be carried out by the RCLCRP IT staff at The (UoL) APEX Building (3rd Floor)..

SQL queries will be written to extract selected data from the HES database. IT staff will link the extracted data to subject data held by the Roy Castle Research Programme; any patient identifiable data fields supplied by HSCIC will not be made available to researchers.

SQL is used to anonymise the data by linking them to unique patient identifier (MPI). Anonymised data are then imported into statistical software

The clinical database used within the RCLCRP has data for 14,000+ subjects; all data is held securely (with additional password protection) and accessed only by trained personnel, in compliance with the University of Liverpool Data Policies.

These records have NHS number and a unique identifier. These identifiers will be used to identify subjects in the HES dataset, but only the local code will be used to identify subjects in any extracted data.

Additionally subsets of the data will be exported, anonymously, and used with statistical software at the University of Liverpool. Data used in the subsets relates to the health status (comorbidities), previous disease history or outcome (death, subsequent disease) of subjects who have provided informed consent and donated samples and/or lifestyle/clinical history to the LLP (RCLCRP). Data on patient identifiers or dates relating to any episode/event are not shared.

The most frequent user of the data is the statistician employed on the LLP (RCLCRP) studies at the University of Liverpool, although other university researchers also have access to the anonymous data associated with participants in their studies. However, these researchers only have access to anonymous data extracted previously by the LLP (RCLCRP) personnel as part of approved research studies associated with the LLP (RCLCRP). The purpose of all uses of the data is the same (the study of lung disease) as set out in the ethically approved study documentation.

The HES and ONS datasets will not be shared with a 3rd Party; extracted anonymous data will only be released to research collaborators following informed consent and ethics approval, release will be covered by Material Transfer Agreement (MTA), in accordance with local and national guidelines.
The data is not accessed directly by the external researchers. Providing that subjects have consented to use by external collaborators then specific anonymous data (extracted by the LLP (RCLCRP) IT and statistical staff may be released to external researchers (typically as part of a larger dataset) following approval of a Material Transfer agreement by the study Sponsor (The University of Liverpool) and approval of the specific collaborative study by the local NRES ethics committee.

All researchers using anonymous data belong to recognised research institutions or registered commercial companies covered by a Material Transfer Agreement. A list of recognised research institutions or registered commercial companies (strictly those for which the University of Liverpool RCLCRP have MTA’s in place) are listed within SD10 – LLP Collaborators.

The purpose for which data will be shared within the MTA agreements is individual to each MTA/organisation with which the MTA agreement is in place and is always for research purposes.

The individual level data which may occasionally be presented to one of these organisations may be for example a sample of blood or tissue with the shared anonymised data that the sample was from a patient of a particular age, who had perhaps encountered a number of episodes of hospitalisation for e.g. COPD or another condition. The data may divulge the age in years, number of hospitalisations for investigations for e.g. lung disease, or perhaps that the sample subject has a diagnosis of lung or another cancer and the number of years cancer present within the sample. Death related data would be limited to age of death or survival period from a specific treatment or diagnosis. In short an anonymised timeline of medical history may be the kind of data shared in association with the human material, but this would be devoid of dates or potential patient identifiers. The high incidence and mortality of lung cancer helps ensure that it is very unlikely that anyone would be able to identify an individual from the nature of the data presented, but care is always taken to ensure that this is the case, especially in publications (where data aggregation is the norm). Geographical (e.g. postcode) are always aggregated and provider data is not a focus of the research.

Data released might include disease or comorbidity status derived from HES or outcome/death status derived from ONS along with data about the subject or samples collected by other legal means (with the consent of the subject) such as case note review. However, this is never provided with any personal identifiers or dates attached, so no link to the initial HES/ONS data or to any individual can be made by the researcher using the data supplied. Data format consists of an encrypted, password protected data file in a recognised database or statistical software file format.

Data provided to external collaborators is totally anonymous and Confidentiality is governed by a number of clauses in the MTA. Under no circumstance would any third party organisation or employee (within a UoL MTA agreement) be able to link any identifiable patient data to material or data shared by UoL RCLCRP.

Data is not always aggregated, but is sufficiently coded to prevent identification of individuals (data stripped of personal identifiers before use & in any representation).

This de-identification meets the requirements outlined within the HES Analysis Guide March 2015. Data is often, but not always, aggregated however, even on occasion when data is not aggregated it is still compliant with the March 2015 HES Analysis guide, in particular Sections 4, 5 and 6.

In a similar way to the establishment of a PSEUDO_HESID, (as stated within the HES Analysis Guide), the UoL RCLCRP MPI No. is used within the RCLCRP study when the HSCIC data is received by the HSCIC authorised IT employee and utilised by the statistician. Similarly, when samples or data is shared with any other organisation, this UoL RCLCRP MPI No. provides a link that can only be used by RCLCRP staff to integrate data. Therefore, no patient can be linked to any of the data received other than within the UoL RCLCRP by approved staff operating within the UoL data governance framework.

Only those UoL employees listed to HSCIC are able access the data.

At no point is any of the HES, Cancer Registry or ONS data used by UoL RCLCRP employees to demonstrate linked patterns of Hospital Admissions to Cancer rates or death statistics.

Project 12 — DARS-NIC-19805-M6T5R

Opt outs honoured: N

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC

Purposes: ()

Sensitive: Sensitive, and Non Sensitive

When:2016.04 — 2016.08.

Access method: One-Off

Data-controller type:

Sublicensing allowed:

Datasets:

Hospital Episode Statistics Accident and Emergency
Hospital Episode Statistics Admitted Patient Care
Hospital Episode Statistics Critical Care
Hospital Episode Statistics Outpatients

Type of data:

Objectives:

This application for HSCIC HES data is part of a research study funded by the Medical Research Council Hubs for Trials Methodology Research (MRC HTMR). The study is standalone and independent, but is broadly part of a wider programme of funding involving research aiming to develop the use of health informatics, including electronic medical records in prospective medical research.

Data regarding patients’ primary and secondary care is routinely recorded in electronic medical records by a number of organisations including the HSCIC. Such data retrieved from electronic medical records has demonstrated utility in clinical research. Electronic medical records have an established role in providing the dataset for retrospective, observational clinical and record linkage studies. In addition, in prospective studies, electronic medical records can provide useful, additional data that can inform analyses such as the long term assessment of mortality.

Although there is a precedent for the use of data retrieved from electronic medical records in retrospective clinical studies and to a lesser extent in prospective studies, there is limited evidence of the attributes of such data when accessed to measure prospective outcomes as part of a pragmatic Randomised Controlled Trial (RCT). An assessment of data retrieved from electronic medical records in the context of prospective clinical research becomes particularly relevant where such data are now being used to conduct all stages of a RCT, including recruitment, intervention and follow up assessments, despite the feasibility, agreement, additional benefit and efficiency being unclear.

This study will assess the feasibility, agreement and additional benefits of data retrieved from electronic medical records in measuring the objectives of a RCT. Subsequently, the efficiency and relative value of accessing data from electronic medical records compared to collecting data using standard RCT methodology will be explored. The electronic medical records will be requested from ‘routine data sources’, primarily the HSCIC but also The Secure Anonymised Information Linkage Databank for participants resident in Wales and the General Practitioner for participants resident in the North West of England, accessed through NorthWest eHealth.

The study will directly inform the methodology of the NIHR Health Technology Assessment Programme funded RCT Standard and New Antiepileptic Drugs II (SANAD II) (EudraCT No: 2012-001884-64, ISRCTN Number: 30294119). For example, accessing electronic medical records for participants of SANAD II may positively inform the health economic analyses and methods to address missing data. This will subsequently inform the methods to be performed in the final trial analyses on completion of SANAD II in 2018, including the access and implementation of data from electronic medical records. Improving the completeness of SANAD II data and precision of the analyses will positively influence health and social care by maximising the value of data collected and outcomes in this publicly funded RCT. Furthermore, the outcomes of this study will indirectly inform the methodology of similar pragmatic RCTs in the future.

The specific objectives in this study where access to electronic medical records held by the HSCIC will be requested are as follows:

1. Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II:
a. Assessment of the feasibility of accessing data from routine sources
b. Assessment of the agreement of data from routine sources

2. Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, relevant to the aims of a RCT, SANAD II:
a. Assessment of clinical efficacy
b. Assessment of adverse events
c. Assessment of health economic outcomes
d. Assessment of the methods of addressing missing RCT data

3. Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II:
a. Assessment of the efficiency of procedures to access / obtain data
b. Assessment of the efficiency of procedures to format data
c. Explore the relative value of accessing data from routine data sources

Expected Benefits:

There are both direct and indirect benefits to healthcare.

This application for HSCIC HES data is part of a research study funded by the Medical Research Council Hubs for Trials Methodology Research (MRC HTMR). The study is standalone and independent, but is broadly part of a wider programme of funding involving research aiming to develop the use of health informatics, including electronic medical records in prospective medical research.

The Standard and New Antiepileptic Drugs (SANAD) RCT is a multicentre, pragmatic RCT of worldwide significance, informing the first line use of antiepileptic drugs in clinical practice and prompting a review of national treatment guidelines. The subsequent study SANAD II (EudraCT No: 2012-001884-64, ISRCTN Number: 30294119) is on-going, opening recruitment in 2013 and will be recruiting 1510 participants’ for a duration of 5.5 years and is expected to exert a significant influence on the evidence for the treatment of epilepsy, the most common neurological disease. The study to which this application refers will directly inform the methodology employed in the data collection and analyses of the SANAD II study. The assessment of the additional benefits of data from electronic medical records, particularly with regards to the analysis of health economic outcomes and methods to address missing data, will inform the subsequent methods employed in the analyses of SANAD II. For example, if implementing data from electronic medical records provides greater benefit when addressing missing data, this data will subsequently be requested for all participants of SANAD II. This directly benefits health and social care by informing the methodology to be employed in the data collection and analyses of a NIHR HTA funded RCT, therefore maximizing the power of the data and results and the subsequent impact on patient care. This output will be measurable based on the methods subsequently employed in SANAD II.

Through output via presentation and publication to the research community involved in clinical trials and clinical trials methodology, there will be indirect benefits to health and social care. The output of this study aims to inform the implementation of data from electronic medical records in prospective clinical research including RCTs. Improved knowledge of the attributes, additional benefits and efficiency of data accessed from electronic medical records will inform the design of future RCTs. Resultantly, RCTs will use electronic medical records for the objectives where a benefit is offered, over standard methods of data collection. This will result in improved efficiency (and therefore costs) of RCTs, frequently funded through public sources and improved participant experience. For example, the number of clinical trial follow up appointments may be reduced if data can be adequately collected using electronic medical records. Finally, indirectly, the assessment of data from electronic medical records in RCTs assessing treatments for epilepsy may also indicate potential utility of electronic medical records in the routine clinical monitoring of epilepsy, although this hypothesis is not being explicitly assessed.

Outputs:

Final results from this study and the associated outputs are expected by study completion (12/2017). All presented or published results will be on a strictly anonymous basis. Non-identifiable aggregate data will be used in presentations and publications with the suppression of small numbers in line with the HES analysis guide when the output involves specific clinical details. The output will consist of descriptive statistics and statistical measures of agreement between data retrieved from electronic medical records, including HES data to data collected through standard methods during SANAD II.

The nature of the sample (60 included participants) results in a possibility that small numbers may be identified for data variables of interest. For example, if 5 participants experience hospital admissions or admission to critical care and data from electronic medical records provides significant benefits over standard RCT methods; this would be important to include in any output. As we are primarily concerned with the agreement and additional benefits of data from electronic medical records rather than specific clinical details, there will be no requirement to include explicit clinical details in any output. In order to present the differences between data from electronic medical records and data recorded through standard methods in SANAD II, there will be a need to highlight the availability of specific data variables. For example, outputs may present that ‘details of MRI scans were available in X number of patients’ rather than ‘X number of patients had an MRI scan demonstrating temporal sclerosis’. The exclusion of specific clinical details at record level in addition to demographic variables and geographical location for individuals involved will ensure participant anonymity is maintained; it is the available data variables and agreement between datasets that will inform the outputs of this study.

In all output, aggregate data will be de-identified and all measures will be taken to ensure that individuals cannot be identified. For example during the analysis the additional benefits will be examined of assessing IMD by LSOA to inform the health economic analysis. However, in any presentation there will be no need to and the LSOA of individual participants will not be presented, but rather the aggregate results. Rare events are not expected, but if these occur and there remains any risk of identification details will be omitted from all presentations and publications and small numbers suppressed in line with the HES analysis guide.

This study will have both direct and indirect outputs. In the first instance, the study will inform the analyses to be undertaken in the SANAD II RCT. Specific components will include the analysis of health economic outcomes and optimal methods to address missing RCT data. For example, if incorporating data from electronic medical records in place of traditional methods such as multiple imputation provides a more rigorous dataset, electronic medical records data will subsequently be sought for all participants’ of SANAD II and included in the analyses on completion of the trial. This output will take the form of a study report and local presentation to the SANAD II study team. This will occur on completion of the study by December 2017. Notably, all members of the team for this study are also involved in SANAD II.

This study will also inform the clinical trials community, contributing to the development and improvement of efficient RCT design with the incorporation of data from electronic medical records. The output will be disseminated to clinicians and academics involved in the conduct of clinical trials and research concerning clinical trials methodology. Members of the public and non-academics may have access to the output through presentations and publications but there is no planned specific dissemination to these groups, with the exception of the participant study report that will be provided on completion of the study. This is justified as the output will be primarily informing the methodological aspects of clinical trials. Indirectly, the assessment of data from electronic medical records in RCTs assessing treatments for epilepsy may also indicate potential utility of electronic medical records in the routine clinical monitoring of epilepsy. Although not directly assessing routine clinical practice or the patients’ perspective in this study, a parallel theme funded by the MRC HTMR involves assessing patients’ perspectives with regards to clinical trials methodology, including the development of ‘core outcome sets’ for clinical trials.

There are multiple objectives to this project and the dissemination of findings aims to take the following forms:

- Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II:

o A narrative assessment of the methods and feasibility of access will be presented at academic conferences including the International Clinical Trials Methodology Conference 2017 and Association of British Neurologists Annual Meeting 2017.
o An assessment of the feasibility, agreement and reliability of data from routine sources will be presented at academic conferences and published in a peer-reviewed clinical journal. The manuscript will initially be submitted to the British Medical Journal during 2016.

- Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, applied to the aims of a RCT, SANAD II:

o The additional benefit of data from routine sources applied to the assessment of clinical efficacy, adverse events, health economic outcomes and addressing missing RCT data will be presented at academic conferences as above and published in a peer-reviewed journal. The manuscript will initially be submitted to Clinical Trials on completion of the study in December 2017 and if not selected for publication will be submitted to similar methodological journals.

- Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II:

o A narrative assessment of the efficiency of accessing and formatting data from routine sources and a discussion of the relative value of accessing data from routine sources will be presented at academic conferences and published in a peer-reviewed journal. The manuscript will initially be submitted to Clinical Trials on completion of the study in December 2017 and if not selected for publication will be submitted to similar methodological journals.

Finally with respect to all study objectives, formal study reports will be submitted to the MRC Hubs for Trials Methodology, the funder of this study, to inform the wider programme of research aiming to develop the use of health informatics, including electronic medical records in prospective medical research.

Processing:

The legal gateway for the flow of data into the HSCIC is informed patient consent.

This study is sponsored by the University of Liverpool and has been approved by the North of Scotland Research Ethics Service and Health Research Authority.

The specific methodological activities involved in the processing of data are as follows:

The SANAD II Data Manager will identify eligible participants by review of data recorded for participants enrolled in SANAD II. Eligible participants will be those aged 16 years and over, with capacity to consent and having completed a minimum of 12 months follow up in SANAD II. Participants’ date of birth, date of enrolment and consent details (to identify those with capacity to consent) will be screened. The names and addresses of eligible individuals will subsequently be retrieved. An invitation pack will be sent via the postal services containing a participant information leaflet, consent form and pre-paid addressed envelope. Informed written consent will be requested for access to identifiable data from electronic medical records for the equivalent time period in SANAD II. Organisations including the HSCIC are specifically named in the consent form. Full, explicit details of the data flows and processing activities are detailed in the consent materials and form. HSCIC feedback has been sought in an earlier application. There are approximately 70 consented participants in this study.

Data from consenting participants will be requested from electronic medical records held by specific ‘routine data sources’. The HSCIC HES data will be requested for participants resident in England. Data will also be requested from The Secure Anonymised Information Linkage Databank (SAIL) for participants resident in Wales and the General Practitioners for participants resident in the North West of England. The General Practitioners will be approached by the study team and if permitted primary care data will be transcribed in the practice by the Principal Investigator.

Data from consenting participants’ electronic medical records will be requested from HSCIC on an identifiable, record level basis, with individual identified by NHS Number. The rationale for this is to allow linking of data regarding an individual from electronic medical records from all routine data sources to the data collected using standard methodology as part of SANAD II, in order to compare the datasets and permit the analyses. Data will be collected for the equivalent time period the individual has been enrolled in SANAD II and will be requested on one occasion only. Data will include medical, demographic and socio-economic variables.

The NHS Number (and name and date of birth if required and indicated by HSCIC) to identify the consenting participant will be securely transferred from the Clinical Trials Research Centre, University of Liverpool to the HSCIC. Subsequently, data from participants’ electronic medical records provided by the HSCIC will be securely transferred to the University of Liverpool. In both cases, data will be transferred using the HSCIC Secure File Transfer (SFT) System. The consent materials and form explicitly permits these data flows in this study.

Participants data from electronic medical records (accessed through HSCIC, SAIL and participants GP’s) will be securely transferred to the University of Liverpool Clinical Trials Research Centre and linked to the data collected as part of SANAD II in order to permit the intended analyses. The SANAD II Data Manager will perform this linking and will therefore receive and access the data from electronic medical records in the first instance. Following linking, the SANAD II Data Manager will pseudonymise the complete dataset with participants identified only by their Unique Study Number. At this stage the dataset will then be accessible to the study team members involved in the analysis. Therefore, all data from electronic medical records and SANAD II data collected using standard methods will be pseudonymised to all members of the team for this study. The SANAD II Data Manager, who must perform the linkage, will have access to the demographic variables of consenting participants’ and medical data but will not and will have no requirement to access participants’ medical data for the purpose of linking. Data regarding individuals received from all sources will be linked. Therefore, secondary care data received from the HSCIC HES datasets will be linked to data collected using standard methods in SANAD II. In addition, for a small subset of participants resident in the North West of England, data retrieved from General Practitioners will be linked to both HES data and data collected during SANAD II. This process is necessary to perform a full assessment of the agreement and additional benefits of routinely recorded data (from all data sources) compared to data collected using standard methods in SANAD II.

All pseudonymised study data from electronic medical records and SANAD II will be stored using the University of Liverpool Research Data Management Service’s DataStore (http://www.liv.ac.uk/csd/research-data-management/storage) at all times. Data is stored electronically on University of Liverpool central servers, located in an access controlled server room and connected to the main University network, located behind a firewall. Physical access is limited to Computer Services Department staff. Data will be encrypted using industry standard techniques meeting the Information Governance Toolkit standard (8HN20). Data will not be transferred to an additional location. The PI for this atudy will act as data custodian. The University of Liverpool Information Security Policy and Research Data Management Policy provide further information regarding data security.

The pseudonymised dataset will be accessed by specific members of the study team based in and employed by the University of Liverpool. Data will then be analysed to assess the following objectives:

1. Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II:

A narrative assessment of feasibility will be followed with a quantitative assessment of agreement between data from electronic medical records and data collected using standard methods in SANAD II. Agreement will be compared at the individual level. Methods to account for paired data would include Bland-Altman methods for continuous data and cross-tabulations and kappa statistics for categorical data. Subsequently, relevant outcomes of the RCT will be examined.

2. Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, relevant to the aims of a RCT, SANAD II:

An exploratory analysis will assess the additional benefits of accessing data from electronic medical records. The assessment of clinical efficacy, adverse events, health economic outcomes and methods to address missing RCT data will be examined. Where linked data are available, agreement will be compared at the individual level in the first instance. Methods to account for paired data would include Bland-Altman methods for continuous data and cross-tabulations and kappa statistics for categorical data.

3. Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II:

The relative value of accessing data from different sources will be discussed in the context of the prior analyses, including knowledge of the relationship between datasets and the assessment of methods of addressing missing data. The optimal ‘mix’ of data from routine sources and standard methods will be discussed. The potential impact on the data collection processes if SANAD II were to be repeated will be considered and the quantitative methods that could be used in future research proposed.

All personal data in this study will be kept strictly confidential and will be handled, stored and destroyed in accordance with the Data Protection Act 1998.