NHS Digital Data Release Register - reformatted

University of Liverpool

Project 1 — DARS-NIC-14337-J4N1T

Opt outs honoured: N

Sensitive: Non Sensitive

When: 2016/04 (or before) — 2018/02.

Repeats: Ongoing

Legal basis: Health and Social Care Act 2012

Categories: Anonymised - ICO code compliant

Datasets:

  • Hospital Episode Statistics Admitted Patient Care

Objectives:

The British Orthopaedic Surgery Surveillance (BOSS) study is a mechanism for researching the treatment of rare orthopaedic diseases within the UK. The methodology detailed in the study protocol is in keeping with similar successful studies of rare diseases performed in Obstetrics and Gynecology (UK Obstetric Surveillance System (UKOSS), BAPS-CASS (British Association of Paediatric Surgeons – Congenital Anomaly Surveillance Study) and BPSU (British Paediatric Surveillance Unit). In these studies, routine data (i.e. disease and anomaly registers) is often used to verify the completeness of case ascertainment – though this is the first time that HES has been used to attempt to augment case identification. The diseases of interest within the BOSS Study are Slipped Capital Femoral Epiphysis (SCFE), and Perthes’ disease. Both are rare hip diseases of adolescence. SCFE is always admitted to hospital for surgery, therefore the data captured within HES is likely to be good. Perthes’ disease is only occasionally admitted; therefore HES data is unlikely to be useful to identify cases of new disease. The data processing and analysis hereafter relates solely to SCFE. All orthopaedic units treating children within the UK are being asked to submit data to the service evaluation/ audit (supported by Orthopaedic Specialist Societies, NICE and the National Clinical Director for Children). As of February 2016, over 150 UK hospitals have agreed to supply data to the study. Details of any new case of SCFE will be recorded by clinicians prospectively using the secure online REDCap clinical trials platform. All English hospitals have a nominated representative and University of Liverpool have made separate applications for the Scottish and Welsh data to the respective organisations. This data will be managed by the Liverpool clinical trials unit, who are overseeing the delivery of the BOSS Study. HRA have given nationwide permission for sites to collect this data, without any additional local approvals and without patient consent (as data is anonymised and data forms part of routine care). The care offered to children affected by the diseases of interest varies considerably around the UK, and beyond. These variations exist owing to the beliefs by the surgeon to which treatment is best and the experience and the skill that the local surgical team can provide. By adequately documenting variations in disease and surgical practice and recording the outcomes in current care, only then can one begin to influence change to improve care across the UK. HES data will be used to ensure maximum case ascertainment is achieved within the study, to ensure the generalisability of the results. When a case of disease is identified within HES (by ICD code) during the study period, the BOSS team in Liverpool will be notified. HSCIC will share with the BOSS team the age and gender of patient, date of admission, date of surgery and hospital. No unique identifiers will be captured; therefore the patient will not be identifiable except to the treating clinicians. The data supplied by HES will then be used to check completeness of the REDCap database (i.e. that supplied prospectively by clinicians) against the HES record. In the event that a case is identified within HES, but has not been reported through RedCap, the nominated surgeon-lead at the relevant hospital will be contacted to ask them to determine the validity of the diagnosis, and if appropriate, formally report the details of the case through REDCap. If the BOSS surveillance mechanism is successful, it will be expanded to other diseases. The intention is that the data generated may serve as stand-alone service improvement, and may generate feasibility research for future clinical trials of treatment interventions.

Expected Benefits:

The information that will be collected from this, the British Orthopaedic Surgery Surveillance (BOSS) study, has been planned in conjunction with patients, their parents and treating clinicians. The formation of a prospective SCFE database was a recommendation by NICE in their recent review of SCFE – the BOSS study is therefore meeting this demand. The information gained from the BOSS study will be the largest study undertaken into the rare disease of SCFE, and will yield information concerning the effects of different interventions on patient outcomes. This will have direct implications for the way that surgical care is delivered, with surgeons being able to benchmark their practice against others. The findings of the BOSS study will inform the feasibility of a clinical trial into interventions for SCFE (i.e. are there enough cases, enough surgeon engagement and enough variation/uncertainty in practice to warrant a trial?). A clinical trial would be the gold-standard means to ensure evidence based care is being delivered to patients. However, if a clinical trial is not feasible, the BOSS study will significantly enrich the current evidence to enhance patient care. The BOSS study group has biannual presentations; at the British Society of Children’s Orthopaedic Surgery (BSCOS) and at a national BOSS Collaborator meeting. BSCOS and the British Orthopaedic Association (BOA) have advocated that their members engage with the BOSS study. The results should therefore have direct, relevant and measurable impact on the clinicians involved. The BOSS study group have worked with members of the SCFE NICE review group to ensure that the findings of the study will address questions raised within the recent NICE review. The BOSS study will therefore have a direct effect on surgical practice and a positive impact on the care of patients. The BOSS study will begin recruitment during 2016 (with HSCIC to aid case identification), will collect outcomes at 2 years (from 2018) and will therefore report in early 2020.

Outputs:

(1) To determine the case mix of SCFE across the UK, the variation in surgical practice and clinical and radiographic outcomes up to 2-years. There will be a published report which will be sent to Trusts and Clinicians on an annual basis during the study and the final report will published no later than one-year after completion of the study. (2) The results will inform the feasibility of a clinical trial into the surgical treatments for SCFE, and will inform NICE related to the guideline surgery for SCFE. This final report will be published in peer-reviewed journals (e.g. the British Medical Journal) in line with NIHR expectations. (3) Publication of the protocol will be undertaken during 2016, publication of the case mix and variations in surgical practice will published in late 2017. Publication of 2-year follow-up will not be possible until 2019. NIHR funding for this study is for 5 years, so the project will be completed by 2020. If a clinical trial were to ensue, this would require sufficient case numbers, surgeon engagement, patient engagement and a well-balanced trial question. The BOSS Study is a good cost-effective mechanism by which to ‘test’ surgeon engagement, case numbers and begin to understand the variation in practice. This project will therefore inform the feasibility of all aspects of a trial.

Processing:

Data will be used as a reference from which to determine the completeness of cases reported by clinicians and prompt additional reporting. Data will be processed by a team within the clinical trials unit at the University of Liverpool. The BOSS study team will cross-check the HES identified case against the list of cases already reported to them by clinicians through the REDCap clinical trials platform. Cases will be identified only by age/sex of patient, date of admission, date of surgery and hospital. The team will make the assumption that no hospital will have more than one admission per day sharing the same details (this is a rare disease and even in larger children’s hospitals more than one case per day is unusual). If a case is not reported to them, the team will contact the clinician participating in the British Orthopaedic Surgery Surveillance (BOSS) study (all hospitals have identified such an individual with the support of the British Orthopaedic Association and the British Children’s Orthopaedic Association). The team will then ask the clinician to verify/refute the diagnosis. If the diagnosis is verified they will ask the clinician to submit anonymous details of the case through the REDCap clinical trials platform (this is in keeping with the successful UKOSS/ British Association of Pediatric Surgeons-Congenital Anomaly Surveillance System (BAPS-CASS) reporting systems). The BOSS Study has been granted nationwide NHS research approval (HRA-cohort 3), and national ethics approval.


Project 2 — DARS-NIC-147982-J7KGV

Opt outs honoured: N, Yes - patient objections upheld (Mixture of confidential data flow(s) with consent and flow(s) with support under section 251 NHS Act 2006)

Sensitive: Non Sensitive, and Sensitive

When: 2016/04 (or before) — 2019/10.

Repeats: Ongoing, One-Off

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC, Other - For one subset of the cohort the data is disseminated under Health and Social Care Act 2012 - s261(7) and National Health Service Act 2006 - s251 - 'Control of patient information'. For the remainder of the cohort the data is disseminated under Health and Social Care Act 2012 - s261(2)(c).

Categories: Identifiable

Datasets:

  • MRIS - Flagging Current Status Report
  • MRIS - Cause of Death Report
  • MRIS - Cohort Event Notification Report
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Outpatients

Yielded Benefits:

The Liverpool Lung Project has utilised data from NHS Digital to develop and validate the LLP risk model. This has been used un the UKLS lung cancer screening trial to identify a cohort of high risk individuals and half of these underwent low dose CT screening, identifying a number of lung cancer (notably the majority of which were early stage an underwent potentially curative surgery). Similarly the risk model is being used in the Liverpool Healthy Lung Project to identify potential cancer patients in a community setting and has been adopted by similar trials and implementation projects. Other yielded benefits include a wide range of publications that have contributed to improved understanding of lung cancer and identified potential biomarkers. These outputs and benefits clearly address the primary aim and support the legitimate interest in utility of NHS Digital data.

Objectives:

The data supplied by the NHSIC to Cancer Research Centre will be used only for the approved Medical Research Project MR1025

Expected Benefits:

Lung cancer is the leading cause of cancer-related death in most developed countries, with mortality rates exceeding that of colon, breast and prostate cancer combined. The benefits of this research are linked clearly to it's aims, namely to find ways to screen for and diagnose cancer earlier to improve the outcomes for patients. Lung cancer is predominantly a disease of the elderly, with an average age at diagnosis of around 60-70 years, and often presented very late at an advanced stage. Given that more than 94% of the patients diagnosed with lung cancer in the UK die of the disease within five years, the primary objective is to detect lung cancer at an earlier, potentially more curable stage (5-year survival rate of stage IA tumour is ~70%). The primary aim of this research is to improve the early detection of lung cancer. Changes likely to have happened as a result of the LLP outputs include: introduction of risk models to identify those most at risk of lung cancer for early detection initiatives (such as screening); identification of biomarkers that aid detection or clinical management of lung cancer; identification of novel targets for cancer treatment. The magnitude of the impact for improved early detection is considered to be great, as lung cancer is the biggest cause of cancer mortality and this is largely because of detection at late stage when treatments are less effective. Screening by low dose CT has already been proved to be of benefit and has been adopted in the USA; a UK randomised control trial that utilised the LLP risk score that has come directly from their use of NHS health data. One of the benefits of using a risk-based approach is in cost-saving by improved targeting of screening to those most likely to benefit. Further analysis of trials is ongoing to provide evidence for mortality improvement, but regional initiatives are already adopting the LLP risk score and delivering early detection opportunities to thousands of individuals. Biomarkers that allow early detection (e.g. from blood samples) or improve screening results (by stratifying risk for those with small potentially pre-cancerous lesions) are also likely to provide a large benefit in terms of numbers of individuals benefiting and efficiency savings (e.g. reducing the need for follow-up scans). These benefits are likely to take longer to achieve as this work provides only discovery and preliminary validation, clinical trials will need to be performed before implementation. Similarly insights into lung cancer biology (e.g. identifying molecular signatures of cancers that have poor outcomes) will have a long lead time to improved treatment. However the potential benefits are great, given the incidence of lung cancer and the relatively low effectiveness of current treatments. The primary beneficiaries of the research will be the general population, through improved health and lower mortality. The NHS will benefit from lower costs or improved efficiency. Research funders are primarily non-profit, but may benefit from royalties paid if any intellectual property is developed and exploited. Those funders that are commercial enterprises aim to benefit by generation or exploitation of intellectual property (e.g. biomarkers and drugs); although it should be noted that the University of Liverpool and other funders have protected intellectual property in order to share these benefits. Given that the LLP plan to publish their research findings it is possible that third parties will benefit, although to do so they will have to extend the work. This research will be of direct benefit for patients in developing those early detection technologies and techniques. Although the pathogenesis of lung cancer is not yet fully understood, researchers have suggested the potential role of the occurrence of concomitant diseases (conditions that occur at the same time) in the aetiology (cause/s) of lung cancer. Due to increasing longevity and rapid ageing populations, the number of people with more than one comorbid conditions is expected to increase sharply in the coming decades. This increase might lead to an increase in the incidence of lung cancer and the comorbidity burden might lead to increase overall and/or lung cancer-specific mortality. This research will allow for these co-morbidity links and contributory factors to be investigated, analysed and reported on and ultimately benefit the patient. The establishment of the Liverpool Lung Project (LLP) cohort has provided an important resource that is internationally recognised and will continue to provide benefits in the future. The ongoing update of associated NHS Digital data will further enhance the utility of this research resource, for example: 1. The University of Liverpool Roy Castle Lung Cancer Research Programme aim to utilise bronchial washings and/or sputum and/or blood to develop molecular assays for early diagnosis of lung cancer. The integration of HES, Cancer Registry and mortality data with the molecular data will allow the researchers to improve the LLP Risk Model tool for application of personalised risk assessment alongside the development of future molecular assays (either targeting those at highest risk, or attenuating the results to account for known confounding factors). The earlier lung cancer can be diagnosed the better the outcome for the patient. 2. Characterisation of risk factors for lung cancer is of considerable health and economic importance, as they can be used to inform prevention, screening and treatment policy. The researchers will continue to develop the LLP risk model for lung cancer and is identifying epigenetic and genetic biomarkers for early detection and prognosis of lung cancer in order to improve outcomes for patients.

Outputs:

All outputs are aggregate with small number suppressed in line with the HES analysis guide. Outputs will include; 1. Reports: Reports for grant awarding bodies will be produced. This will include reports in support of additional funding applications for further analysis, ensuring maximum utility and benefit from the data provided. Annual Report(s) for funding body, e.g. The Roy Castle Lung Cancer Foundation (RCLCF) to identify type of research undertaken, recruitment statistics and specific research developments within the funding period. This report is seen by the Roy Castle Lung Cancer Foundation Executive and Scientific Committee and its Trustees to inform policy and quantify the benefit of future funding of the research programme. 2. Publications: It is anticipated that the analysis from this study will be included in internationally renowned oncology, epidemiology and public health journals. Publications will be prepared for 2017, 2018, 2019 and 2020. Journals for consideration will include; Thorax Journal of Thoracic Oncology Lung Cancer British Journal of Cancer Cancer epidemiology, biomarkers & prevention Scientific Reports Oncology Letters Nature Genetics Nature Communications Lancet Lancet Oncology More information on past publications can be found on the study website - http://www.liverpoollungproject.co.uk/publications 3. Presentations: In accordance with previous years it is expected that presentations will be given at major cancer conferences. These presentations will provide dissemination of results from ongoing studies of LLP Risk Modelling, Methylation, MicroRNA, Sequencing, etc. Conferences are expected to include: World Conference on Lung Cancer American Association for Cancer Research Annual Meeting National Cancer Research Institute Annual Meeting

Processing:

Since 2007, the LLP has periodically provided identifying details of consented participants to NHS Digital or predecessor service providers (ONS, the NHS Information Centre and the Health & Social Care Information Centre). LLP has been provided with the linked data about participants’ deaths (date and cause), cancer registrations, exits from or re-entries to the NHS and, subsequently, Hospital Episode Statistics. The flow of data into NHS Digital is limited to personal identifying information required for linkage (NHS number) and for verification of that linkage: unique cohort MPI No. (Master Patient Index), Name, Date Of Birth, Gender, NHS Number and Postcode. This data subjects all gave informed consent to take part in the study, including for access to their health records. Where consent taken in the past was deemed not to meet current standards, the LLP have obtained support under section 251 of the NHS Act 2006 from CAG to permit the processing of confidential data without fully informed consent. The flow of data out of NHS Digital consists of a download of data files containing the data initially supplied, plus matching fields from NHS Digital and the requested health care and outcome data, the majority of which can be considered as special category data: specifically race/ethnic origin and health data. As previously, identifying data other than the minimum required for basic data linkage, are provided to ensure quality assurance of matching (i.e. latest name/gender/DoB/postcode). The data files are transferred to an encrypted drive on a dedicated University of Liverpool virtual server. This is only accessible to a limited number of qualified staff and is password protected within a managed environment. Data supplied by NHS Digital is processed for inclusion into the LLP clinical database, to enable its use for the approved legitimate purpose. The data is linked at patient level with data in the LLP clinical database using the subject specific MPI number only. The data within this database are pseudonymised as much as practically possible but include health data and event dates. Event dates are required for calculation of time periods in relation to other events both within the data provided by NHS Digital and to events collected by other means, e.g. directly from the subject or by review of hospital records. Data processing is performed within a University of Liverpool managed environment (password protected) and limited to specific folders accessible only to qualified staff associated with the LLP study. In addition the clinical database on which the data resides is separately managed with additional controls on staff access. Data flow outside of the clinical database (but remaining within the managed network) may include both personal identifying information and special category data (ethnicity and health data). This will only happen as required for data cleaning or collation of data from other sources (e.g. confidential provision of patient lists for case-note review by study-associated clinical staff with the approval of local Caldicott Guardians – in this case NHS Digital data is used to identify subjects, but the data itself is not shared). The data will only be accessed by researchers employed by the University of Liverpool and not shared with any third parties. Subsets of data from the clinical database including information derived from the data under this Agreement may be extracted and shared with collaborating organisations. Any data that is shared with these collaborative organisations will be pseudonymised with a unique study ID that allows the LLP to link results of analysis back to individuals. Dates are removed (replaced by ages or time periods, e.g. time form diagnosis to death) or limited (e.g. year or month + year); health events are curated (e.g. classed as lifetime events rather than time dependent events) or recoded to remove granularity (e.g. grouping into less specific terms such as “lung disease”). The derived data will be shared in combination with data from other sources (e.g. pathology records, electronic patient records, questionnaires) but this data will not contain identifying information or any information which would result in any shared data being identifiable as originating or deriving from the data from NHS Digital or possible to reverse-engineer such that it can be so identified. The combined data will conform to the specification shared with NHS Digital (filename ‘UoL Lung projects Data sharing MD209180218’) which NHS Digital has approved. Given the high incidence of lung cancer and associated co-morbidities, it is considered incredibly unlikely that any re-identification could occur, even with access to other data. Where possible data is provided in aggregate form, although the nature of the research performed often requires individual level data (to link biomarkers or attributes to specific health outcomes). In most cases the data is combined with new data produced from biological or biochemical assays (in associated samples provided from the LLP biobank) or from algorithms based on risk data provided by subjects via questionnaire. Having combined the data an assessment is made as to whether the new data allows the LLP to predict specific health outcomes (e.g. diagnosis, specific disease sub-type related, disease severity or outcome) – providing a risk score or diagnostic algorithm that may help the LLP guide future treatment. Additionally correlations are made between data that help the LLP understand the biology of lung cancer, which provides new opportunities for alternative treatments. Any data shared must be subject to the conditions that the collaborating organisation: i. must not combine it with other datasets which could potentially increase the risk of reidentification for individuals in the dataset; ii. must not attempt to re-identify individuals in the dataset; iii. must not onwardly share the dataset; iv. must use the dataset for a defined purpose in support of the LLP’s aims defined within this Agreement, and v. must not publish the data. Under the terms of this Agreement, the University of Liverpool is responsible for ensuring compliance with the above conditions and for confirming destruction of the data by any collaborating organisation once the data is no longer required for the purpose for which it was shared. Organisations receiving data will all be involved in health research that furthers the aims for use of the data. This includes non-profit and educational establishments and commercial research organisations (e.g. biotech and pharmaceutical companies). These organisations may be in the UK or oversees. In the case of international sites and commercial organisations, data will only be shared for subjects who have explicitly consented to this as part of study recruitment informed consent procedures. Data Transfer Agreements (or Material Transfer Agreements if samples are also included) cover all such transfers of data and confer the same standards of care as specified for receipt of data from NHS Digital, meeting all obligations of the relevant data protection legislation. The ultimate flow of data is into publications made publicly available for the benefit of the wider research community. Special care is taken to ensure confidentiality is maintained and re-identification is not possible; data is in aggregate form in publications. The risk of reidentification via data linkage is relevant for any data subsequently shared but is mitigated in a number of ways. Aggregation is widely used for sharing results, in which case individual level data is not available for linkage. Where aggregation is not possible, studies are of a substantial size for a common disease, so individuals cannot easily be identified unambiguously even based on multiple parameters, e.g. disease status, gender and age. Dates are supressed (either truncated or converted to time periods /age). Minimal relevant summary data, e.g. disease type, appropriate to the research question addressed is provided, rather than full case histories. Geographical data (e.g. postcode, collected form the subject, not provided by NHS Digital) is only used to contact individuals or to gather other data (e.g. deprivation index, radon exposure) – in which case data for postcode alone are shared with no other identifying information; the subsequent data is linked to individuals within the secure LLP clinical database and only results (not location) shared subsequently. The data from NHS Digital is linked with other information collected from or about the participants including questionnaire responses and information derived from samples of blood, sputum or tissue. The data are then pseudonymised for analysis. National patient opt-outs will be applied to all data released by NHS Digital under this Agreement. The opt-out policy does not require opt-outs to be applied for individuals who gave sufficiently informed consent for their data to be processed but the University of Liverpool has chosen for opt-outs to be applied for the whole cohort having given due consideration to the following factors: i. The cohort comprises a mixture of participants recruited prior to 2003 whose consent in relation to the specific data processing described in this Agreement was not sufficiently informed and for whom support under the NHS Act 2006 section 251 allows the processing of their data without consent, and participants who gave sufficiently informed consent since 2003. The majority of participants were recruited prior to 2003. ii. The participants are based in geographical areas with comparatively lower uptakes of patient opt-outs. iii. The LLP actively seek to re-consent existing participants using the latest versions of consent materials whenever there is a suitable opportunity and this would create practical challenges in managing which participants are deemed to have given sufficiently informed consent (and therefore are exempt from opt outs for the purpose of the LLP) and which have not. The overhead of managing which participants belong to which group was considered to outweigh the risk of losing consented participants who have registered an opt-out. iv. Consent is permissive but does not oblige the LLP to use participants' data in the programme. The data is received and stored at the University of Liverpool. All data processing of the data supplied by NHS Digital takes place at The University of Liverpool and is carried out by the University of Liverpool Roy Castle Lung Cancer Research Programme (Liverpool Lung Project) staff, holding substantive contracts of employment at the University of Liverpool and having received appropriate Data Protection and Good Clinical Practice (GCP) training. Staff are based within The William Henry Duncan Building with data held on secure servers within the University of Liverpool (in compliance with the IG toolkit). NHS Digital data, prior to processing and transfer to the LLP clinical database, is only accessed on University of Liverpool premises across a secure data network (password protected) from a BitLocker) encrypted virtual server (by approved LLP study staff); the secure server is firewall protected and accessible only from 3 designated PCs (with unique IP addresses). After processing, data is stored on a System Builder database located on a secure data network at the University of Liverpool; this network is only accessible to staff and is password protected. Data is located in folders that are limited to named staff. The System Builder database has additional access control including different usernames and passwords. All hardware is in secure environments: servers within the Computer Services Department and PCs within research building with swipe-access control. Procedural control ensures that PCs are never left accessible. All data is stored on servers (rather than individual PCs) and is backed-up routinely with the same level of protection (i.e. the encrypted server has an encrypted back-up). Only data that is anonymised or pseudonymised is shared by e-mail or by portable media, in which case all files are password protected and/or encrypted during transit. The data are used for the overarching long-term objective of building a greater understanding of how to identify individuals at high risk of lung cancer. The data are added to the LLP’s risk model. The exact uses of the data evolve over time as science moves forward. For example, as new scientific technical emerge, the data are used in different ways but always within the scope of the overarching objective. The data will only be processed for the purposes described in this document. Data received from NHS Digital will not be shared with any third parties.


Project 3 — DARS-NIC-150521-F2Q1V

Opt outs honoured: No - data flow is not identifiable (Consent (Reasonable Expectation))

Sensitive: Non Sensitive

When: 2018/10 — 2018/12.

Repeats: One-Off

Legal basis: Health and Social Care Act 2012 – s261(2)(c)

Categories: Anonymised - ICO code compliant

Datasets:

  • Hospital Episode Statistics Accident and Emergency

Objectives:

Epilepsy is the recurring tendency to have unprovoked seizures. With a prevalence of ~1%, epilepsy is the second most common serious neurological disorder in the UK. As well as having potentially important life implications for patients and families, epilepsy also has important societal impacts. One is the cost of providing emergency care. In the UK, 20% of people with epilepsy (PWE) visit hospital Accident and Emergency Departments (A&E’s) each year for seizures. In England alone, there are around 100,000 visits to A&Es each year. The cost of this in 2015/16 was ~£70 million. One reason costs are so high is because half of the PWE visiting A&Es are admitted to hospital; indeed, 85% of admissions for epilepsy occur on such an unplanned basis. Readmissions further drive costs up; ≥60% of PWE re-attend A&E within 12 months. This rate of return is higher than seen for other long-term conditions with episodic relapse, like asthma, and diabetes. Seeking emergency care for epilepsy can be appropriate, important, and even life-saving. Evidence from projects, such as the research team’s recent UK-wide National Audits of Seizure Management in Hospitals, now show though that most persons attending A&E do not attend for such reasons. Instead, most have known, rather than new epilepsy and present with non-emergency states which do not require the full facilities of an A&E. One of the reasons driving this use it that patients and their family members frequently lack the confidence and knowledge to manage seizures by themselves. The research team, based at University of Liverpool, has developed seizure first aid training for this part of the epilepsy population and has recently completed a pilot randomised trial of it, called the Seizure First Aid Training for Epilepsy, SAFE trial. The SAFE trial focused on the 60% of this group (and their informal carers) who make multiple attendances in a year and together account for ~90% of all A&E visits made for epilepsy. The trial compared receipt of the intervention to usual care alone. The trial was completed with NHS ethical approval and HRA approval; was sponsored by the University of Liverpool and publicly registered (ISRCTN13871327), and was funded by the National Institute of Health Research (Health Services and Delivery Research programme (Project Reference No:14/19/09). A pilot randomized trial is not designed with the aim to prove the superiority of one treatment over another, but rather to try out aspects of the larger trial and address design uncertainties that exist (Whitehead et al. Clin Trials 2014; 38: 130–133). The size of a pilot trial is rarely adequate to conduct statistical hypothesis tests as would be the case for the main definitive trial. Despite this, pilot trials have an important role in health care and benefit the health and social care system since they help us to understand how best to complete a full trial so that it can be well positioned to generate the scientifically rigorous evidence required to inform care and maximize patient outcomes. Moreover, pilot trials can help society avoid wasting finite resources on trials that are unfeasible or poorly designed. To this end, major funding bodies such as the UK’s National Institute for Health Research (NIHR) and the Medical Research Council expect pilot evidence on the feasibility of a trial before large amounts of money are released for a large trial to be completed. A pilot trial was necessary as a range of uncertainties existed as to how to conduct a definitive trial. Uncertainties pertinent to this request for data from the HES A&E system were: • The absence of an initial estimate for the sort of effect the seizure first aid intervention had on the proposed primary outcome for a definitive trial – namely participants subsequent use of A&E – and lack of an estimate of the annual rate of A&E use in the control arm and its dispersion. Without this information it is difficult to know what sort of sample size would be required for a definitive trial to ensure it was adequately powered to detect an effect if one existed. Going ahead to main trial without this information would have risked recruiting too few or too many participants. If too few were recruited, the probability of finding a clinically relevant difference would have been low and therefore, the chance of providing an inconclusive result high. Conversely, if too many participants were recruited then resources would be wasted, more patients than necessary could be given a treatment which will later be proven to be inferior; or an effective treatment may be delayed from being identified. • The second uncertainty concerned how best to measure/ capture information on a person’s A&E use in a trial. For example, one could ask participants to self-report on their use, but this presumes all participants are well enough to answer the question and can provide an accurate answer. Memory impairment and mood disturbance are common in epilepsy and may impair recall. Given these uncertainties, and since the HES system provides the only comprehensive record of a person’s use of all NHS A&Es across the country, HES A&E data is required for the participants in the trial relating to the 12 months before and after they entered the trial. All participants in the trial provided explicit consent for their HES A&E data to be obtained. Having access to their HES A&E data would allow the above noted uncertainties to be addressed in the following ways: • By enabling a description of the use of A&E by participants in the two treatment groups before and after entering the trial in order to generate an initial estimate of any change that occurred in A&E use in the two treatment groups and determine the annual rate of ED visits in the control group and its dispersion parameter. All this information could then be factored into a sample size calculation for a future definitive RCT. • Participants in the trial were asked to self-report on their use of A&E during the 12 months before and after entering the trial. Having HES A&E data for these participants for the same periods of reference would enable a comparison of patients self-reported use of A&E against objective data on their A&E use. This comparison would allow measurement of the extent of agreement between the two measurement approaches and inform discussions about how best to measure A&E use within a future definitive trial, and indeed any other similar trials. A multi-centre, external, pilot randomised controlled trial (RCT) was conducted with PWE aged ≥16 years who visited the A&E of one of three NHS hospital trusts in the NW of England (namely; Aintree University Hospital, Royal Liverpool University Hospital, Wirral University Teaching Hospital), in the prior 12 months for epilepsy on ≥2 occasions who could independently complete questionnaires in English, along with one of their family members or friends who have an informal caring role. Ostensibly eligible patients were identified and invited to participate in the trial by their NHS A&E consultant who sent them an invitation letter in the post, along with a Participant Information Sheet. Persons who were interested in taking part were in turn contacted by a GCP-qualified, postdoctoral study researcher who confirmed patient eligibility, provided information and answered any questions the patient had. The researcher also provided the patient with a further copy of the Participant Information Sheet. Participants had a minimum of 24 hours to decide whether they wanted to take part or not. As part of the consent process, 58 participants were recruited between May and December 2016. Participants provided informed written consent to participate and for the research team to access identifiable data from the HES A&E system on the number of times they had visited an NHS A&E in the 12 months prior to entry into the trial and then during the time period they were enrolled in the trial. Data on participants’ use of A&E before coming into the trial is required to permit adjustment for potential differences in baseline use of A&E between the two trial arms (i.e. those who took the training and those who did not) . Participants taking part were put into one of two groups at random by a computer. The first group is called Group A and the second Group B. People who are put in Group A get the Seizure First Aid Training course (treatment) straightaway and people in Group B continue to receive their normal medical care (treatment as usual (TAU). The health of the people in the two groups will be compared to see if the Seizure First Aid Training was helpful or not. After the two groups’ health has been compared, people in Group B then get to go on a Seizure First Aid Training course if they want it. Over the course of the trial, patient participants were followed-up and required to each complete three sets of questionnaires, either in a face-to-face interview with a research worker (at baseline and at 12-month follow-up) or through the post (at 6-month follow-up). At these assessment points, participants were asked to self-report on the number of times they had visited any NHS A&E. At baseline they reported upon A&E use in the previous 12 months. At follow-up they reported on A&E use since their prior assessment. Prior to each assessment point, participants were contacted and asked whether they wanted to continue to participate in the trial or whether they wanted to withdraw their consent. During the course of the trial 5 patient participants formally withdrew and so HES A&E data will not be requested for them. The size of the project’s sample size does not impact on the ability of the project to achieve its aims. Sample sizes between 24 and 50 have been recommended as ‘adequate’ for pilot trials (e.g., Sim & Lewis, J Clin Epidemiol 2012 65: 301-8; Julious, Pharml Stat 2005 4: 287-91).

Expected Benefits:

Whilst the trial will not be statistically powered to detect a clinically meaningful difference in outcome between treatment groups, summary statistics will be conducted to measure the effect of the intervention on the proposed primary and secondary outcome measures (outlined previously) and the precision of such estimates at the post-treatment time points. As such, it will directly inform the methodology employed in the data collection and analysis of a possible future definitive trial. This benefits health and social care by informing the methodology to be employed in data collection and analysis of a definitive trial, therefore strengthening the evidence base which underpins the data and the results. This output will be measurable based on the methods subsequently employed in the definitive trial. Indirect benefits to health and social care will be achieved through output via presentation and publication to the research community involved in clinical trials and trial methodology. The output of this trial aims to inform the implementation of data from electronic HES records in a prospective definitive trial. Resultantly, RCT’s will use electronic medical records where a benefit is offered over self-report methods of data collection. This will result in improved efficiency of RCT’s, frequently funded through public sources and improved participant experience. Benefits include having significant implications for the lives of patients and reducing unnecessary emergency admissions is a key factor in helping to relieve financial pressure on healthcare services. Another major social issue is the indirect cost of epilepsy due to lost employment. The health and social costs could be reduced, and quality of life improved via better outpatient management. However, around 40% of those diagnosed have poorly-controlled epilepsy and continue to have two or more seizures per year, despite antiepileptic drug treatment. These findings highlight missed opportunities for epilepsy self-management. Guidelines are clear that, with the correct training, such seizures can be safely managed by patients and their families within the community. Evidence indicates that people with epilepsy that frequently visit the A&E might benefit from a self-management intervention that improves their own and their informal carers’ confidence and ability in managing seizures and empowers them to be able to tell others from their wider support network about first aid. It follows therefore, that indirectly the assessment of data from the electronic HES records in a definitive randomised controlled trial to evaluate the effectiveness of seizure first aid training intervention for people with epilepsy will provide the best possible information in relation to A&E attendance by people following a seizure. This data will subsequently be used to inform service commissioning decisions. Commissioners planning health and social care services need good information about the experience of people with epilepsy and the intervention(s) they receive, as well as the result of that intervention as a means to ensuring that people are getting the services that are right for them. Reducing unnecessary emergency visits to hospital by people with epilepsy is identified as one way that resource limited health services can generate savings. In addition, reducing emergency visits is also important for service users; not least because emergency department visits can be inconvenient, distressing and do not typically lead to extra support.

Outputs:

The data provided by NHS Digital will directly inform the outputs of the NIHR (HS&DR) funded project and allow the research team to achieve several of the key objectives. First and foremost, the data will allow an estimate of the effect of the training intervention. This will help understanding around whether the seizure first aid intervention developed is likely beneficial in reducing ED use. Secondly, the data provided by NHS Digital will enable understanding of how data on A&E use captured by the HES system compares to patient self-report. The results from these analyses will provide knowledge on how to conduct the definitive trial if it is deemed appropriate. Findings will be published in the form of a publicly available report to the NIHR and within a peer-reviewed publication. In no publication will the identify of participants be identified. Final results from this trial and the associated outputs are expected by December 2018 in line with the project’s completion date. All presented and published findings will be anonymised and compliant with NHS Digital's operating procedures in relation to presentation and publication. Non-identifiable aggregate data will be used in presentations and publications. Outputs will consist of descriptive statistics and statistical measures of agreement between data retrieved from the HES A&E system and participants’ self-report. The HES Analysis Guide rules will be complied with. Only the mean/ median number of A&E visits by participants in the two groups and the dispersion parameters shall be described. The change in A&E use between the two groups will be reported and compared. In no instance shall data where cell counts are less than 5 as specified in section 5.1 of the HES Analysis Guide be published. Moreover, the maximum number of visits by an individual shall not be reported since this is information that can relate to an individual. The primary focus here is to ascertain the agreement and additional benefits of data from the HES A&E data set and not to report on specific clinical criteria. It is not the intention to present record level data in any report. In all outputs data will be de-identified and all measures taken to ensure that individuals cannot be identified. For example, information regarding geographic location, timing and gender will be omitted as well as explicit clinical / personal details. Furthermore, both epilepsy and the clinical event of A&E attendance(s) under assessment are very common. This project will have both direct and indirect outputs. The pilot trial will include the analysis of individual participants’ use of A&E at baseline and over the 12 months of follow-up. This will assist in determining whether incorporating A&E attendance from electronic medical records in place of patient self-report records provides a more rigorous data set. This information will be included in the analysis on completion of the pilot trial. This output will take the form of reports and presentations. This project’s findings will also inform the wider epilepsy research community contributing to the development and improvement of efficient future trial design. In particular, information from HES data will provide evidence on how accurate people with epilepsy are at self-reporting on previous emergency department use. Using HES data from the electronic system to provide the primary outcome data would extend the timescale of a future trial and increase costs. At present, no evidence exists therefore, to be able to inform a future trial about how best to measure A&E use, the coverage and accuracy of patient self-report will be compared to data from the HES system. This will help determine whether the expense associated with use of the HES system as the primary means of measuring A&E use is warranted. These outputs will be disseminated to clinicians and academics involved in the conduct of clinical trials and research, concerning clinical trials methodology, within the epilepsy population. Dissemination of findings will be presented at academic conferences (potentially; 13th European Congress on Epileptology http://epilepsyvienna2018.org/scientific-programme/) and in peer-reviewed journals (potentially; Journal of Neurology, BMJ Open, Epilepsy and Behavior) and will take the form of a narrative assessment of: - the methods and feasibility of access to electronic medical records - the agreement and reliability of data from routine sources - the benefit / limitations of data from electronic medical records

Processing:

Data management and analysis is to be conducted within the Clinical Trials Research Centre (CTRC) at the University of Liverpool. The specific methodological activities involved in the processing of data are as follows: The NHS Digital HES A&E data will be requested for all trial participants who provided consent and who have not subsequently formally withdrawn from the trial. A request for data will be made for all applicable participants on one occasion only. The time period for getting data on participants’ use of A&E relates to the 12 months prior to them entering the trial and the 12 months following their enrolment. The research team will send NHS number, the unique (pseudonymised) study ID, and the date of recruitment to the study to NHS Digital. Within the data file sent back to the University of Liverpool from NHS Digital, the University of Liverpool need to know the study ID associated with each individual’s A&E visit that occurred within the time period of interest by the University of Liverpool’s list of patients. This is necessary so data from the HES A&E system can be allocated to the correct participants in the trial and to enable a comparison of the use of A&E of participants in the two trial arms. In order to marry the detail provided from participants the date of A&E visit is required from NHS Digital to ensure accurate matching. The resulting file generated by NHS Digital would include any A&E visits by patients who participated in the trial captured by the HES system that occurred within the relevant time. The data provided would include the date of the visit. The data file would be securely transferred back to the CTRC at the University of Liverpool, again using NHS Digital’s Secure Electronic File Transfer SEFT system. Analyses will not be completed using the identifiable data set and to ensure as few can access this file a structured process, used previously by the research group when using HES data will be followed. Specifically, having received the data file from NHS Digital the trial’s postdoctoral research fellow will, within the confines of CTRC, work with the identifiable dataset to create a pseudonymised version. To do this, the research fellow will attribute all A&E visits captured by the HES system to the appropriate participants in the trial. The resulting file will only contain participants Unique Study Number and the number of A&E visits that they made during the relevant time periods. The patients NHS number will not be included in this file. The data will then be linked, using the patients Unique Study Number, with the main trial database and the self-report data provided by participants in the trial. Following this process, the data set will then be accessible to the study team members involved in the analysis, including the Clinical Investigator and the trial statistician. The primary outcome by which treatment effect will be estimated is defined as the number of epilepsy-related A&E visits made by patient participants over the 12 months following randomisation measured by HES data. The number of A&E visits at the end of the 12-month follow-up period will be presented, in addition to the change in the number of A&E visits at the end of the 12 months compared to the number of A&E visits in the 12 months prior to baseline. Results will be presented as mean and standard deviations if data are normally distributed and median, IQR and range if data are skewed. Results will be presented overall and by treatment group. The difference between the treatment group compared to the TAU control group will be expressed as a mean difference and 95% confidence interval and statistically tested according to a 5% level of significance by an independent t-test if data is normally distributed. In addition to aid interpretation, 90% and 80% confidence intervals of the mean difference will be reported. The difference between the groups will be tested with a Mann-Whitney U test if the data are skewed. To maintain the original blinding of the trial statistician during the SAFE trial, anonymised HES data without any details of intervention will first be analysed and overall results presented. Subsequently, intervention allocations of individuals within the HES data will be made available to the trial statistician and results will be presented by treatment group and statistical testing will be performed. For the secondary objective, self-reported numerical results will be compared to those of the results of epilepsy-related A&E visits calculated from HES data. If possible, Bland-Altman agreement statistics will also be calculated to determine the agreement of the two measurement methods of recording A&E visits. The term “if possible” is used not because the sample size might preclude completion of these analyse, but rather to account for the fact that without having seen the HES A&E data and there was the possibility, albeit unlikely, that the researchers could not calculate these statistics because the HES data received was is in a different format to the self-reported data and so the two cannot be directly compared. All data received will be stored using the University of Liverpool Research Data Management Datastore (https://www.liverpool.ac.uk/csd/records-management/storage-and-disposal/). Data will be stored electronically on University of Liverpool central servers, located in an access-controlled server room and connected to the main University network, located behind a firewall. Physical access is limited to Computer Services Department staff. Data will be encrypted using industry standard techniques meeting the Information Governance Toolkit standard (8HN20). The data will not be transferred to an additional location. The SAFE trial CI will act as data custodian (https://www.liverpool.ac.uk/library/research-data-management/storing-your-research-data/). The University of Liverpool ‘Information Security Policy’ and ‘Research Data Management Policy’ provide further information. The pseudonymised dataset will be accessed by specific members of the SAFE trial research team based in the University of Liverpool. All outputs will contain only aggregate with small numbers suppressed in line with the HES analysis guide. All data will be stored and accessed at the University of Liverpool at all times. All personal data in this trial is kept strictly confidential and is being handled, stored and destroyed in accordance with GDPR. Only individuals substantively employed by the University of Liverpool and are part of this study will have access to the data. No other collaborator of this study will have access to the data received from NHS Digital. All organisations party to this Agreement must comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by “Personnel” (as defined within the Data Sharing Framework Contract - i.e. employees, agents and contractors of the Data Recipient who may have access to that data).


Project 4 — DARS-NIC-16656-D9B5T

Opt outs honoured: No - data flow is not identifiable

Sensitive: Non Sensitive

When: 2016/09 — 2018/12.

Repeats: Ongoing, One-Off

Legal basis: Health and Social Care Act 2012, Health and Social Care Act 2012 – s261(1) and s261(2)(b)(ii)

Categories: Anonymised - ICO code compliant

Datasets:

  • Hospital Episode Statistics Outpatients
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Accident and Emergency

Objectives:

HES data will be used to develop a longitudinal panel of neighbourhood (Lower Super Output Area - LSOA) indicators. These will be used to investigate the impact on health care utilisation of risk factors, policies and interventions. Analysis of this longitudinal panel will: 1. Investigate the impact across England of socioeconomic changes, national health and welfare policy changes, environmental changes and infectious disease trends on healthcare utilisation and whether there are neighbourhood level characteristics that modify these effects. Analysis will investigate inequalities between neighbourhoods in the consequences of these adverse trends and events. Analyses for this Objective will indicate the contextual factors driving adverse health outcomes and health service utilisation at the neighbourhood level. 2. Evaluate the impact of area based local authority and NHS economic, environmental, social, governance and service redesign activities on health outcomes and demand for health and social care services. 3. To develop predictive models of the factors driving adverse health trends and increases in demand for health services at the neighbourhood level, that can then be used by local agencies to better target resources at the root causes of ill-health and health service demand and the neighbourhoods most affected. 4. To develop new approaches for monitoring progress on health inequalities at the neighbourhood level and involving the public in using data to influence local services and policies - supporting Open Data initiatives to promote transparency and accountability.

Expected Benefits:

Benefits from reviewed journal papers and related analysis. 1. The impact of trends in gastrointestinal infections on health care utilisation. Analysis indicating the impact of gastrointestinal (GI) infection trends on health care utilisation and the extent to which this is mediated by socioeconomic and health service related factors, will indicate how targeted interventions that reduce GI infections and actions that influence the health seeking behaviour of people with GI could reduce healthcare usage. Alongside this analysis Liverpool are working with local public health and environmental health teams to develop targeted interventions to reduce inequalities in the causes and consequences of gastrointestinal infections. This analysis will inform the development of these interventions leading to more effective approaches. For example this could include actions to support parents caring for children with gastrointestinal infections and promoting alternatives to A&E by enhancing support through pharmacies and primary care. 2. The environmental determinants of health care utilisation. This analysis will identify the extent to which environmental factors, such as air pollution, flood risk, housing quality and fuel poverty influence health care utilisation and inequalities in these effects by area deprivation. Previously strategies to manage demand for health care services have focused on service redesign rather than environmental determinants of health. This analysis will be used to develop strategies with local partners to reduce demand for health care by addressing important environmental determinants. The results will indicate the potential savings to the NHS from investment in initiatives to reduce fuel poverty or improve air quality, for example. This will then lead to benefits both through improving health and reducing preventable health care costs. 3. The effect of changes in social care funding and welfare reform on health care utilisation. This analysis will indicate the effect of changes in social care funding and welfare reform on health care utilisation and the factors that might mitigate these effects. Funding for social care is currently being reduced relative to demand, and major welfare reforms are being introduced, however, it is not currently known what effect this is having on healthcare utilisation. The analysis will indicate the potential costs to the health service of these policies. It will inform national policy debates about the costs and benefits of different approaches to welfare reform and the allocation of resources for health and social care services. It will help identify the characteristics of local systems that are more resilient to these changes – enabling the development of local health, social care and welfare systems that can better improve health and reduce health inequalities. 4. The health inequalities impact of initiatives to promote neighbourhood resilience. This analysis will indicate the health inequalities impact of a number of local initiatives that aim to promote economic, environmental and social resilience in disadvantaged neighbourhoods in the North West. These include initiatives to improve housing, increase financial security, reduce social isolation and improve public involvement and governance. This will indicate what works and provide evidence for local authorities across the country helping them develop initiatives that promote resilience, improve health and reduce inequalities. 5. What components of resilience have the greatest impact on health. The University have developed a model of resilience with local authorities in the North West that focuses on economic, environmental, social and governance systems. However it is not yet known what the relative impact of these components is on health and health inequalities. Analyses will indicate the health gains that could be expected for investments in different components of this model and the interactions between them. This will enable the more efficient use of resources to develop more resilient systems that reduce health inequalities. 6. The impact on health care utilisation of new models of out of hospital treatment and care and community orientated primary care. There are currently a large number of new models of out of hospital treatment and care, being developed across the country, particularly as part of the Vanguard programme. New initiatives are often overlaid on top of and interact with existing programmes and wider system changes. The NHS and local authority partners of the NWC CLAHRC have identified this as a priority for the research programme supported by the NWC CLAHRC over the next 3 years. Analyses will identify the components of new models of care along with wider system changes that appear to be effective, both within primary care and at the interface of primary, secondary and social care. Analysis will particularly focus on how these effects differ across socioeconomic groups and interact with the social and environmental determinants of health . This will support the development of out of hospital care that addresses inequalities and improves health whilst reducing healthcare utilisation. For example this could include new approaches for incorporating wider social support in general practice through the third sector or identifying the key components for the effective integration of health and social care teams. 7. Predicting adverse tends in neighbourhood health. Increasingly health and social care systems are using risk prediction and stratification methods to target resources and interventions. These have tended to use individual risk factors and model risk at the individual level. This tends to neglect the impact of environmental and area-based determinants of health outcomes. This paper will outline the methods used to develop a risk prediction model that is based on neighbourhood level analysis, incorporating a broader set of individual and environmental determinants than models based solely on individual risk factors. Publishing the methods for producing the model will enable the robust development of a tool that local authorities and NHS organisations can use to target the right actions at the right risk conditions in the right neighbourhoods to most effectively improve health and reduce health service demand (see below). Conferences/presentations. 1. NIHR HPRU annual conference – 2017 This presentation will be used to disseminate the early results from the analysis for Paper 1 to an audience of NHS, Health Protection and Environmental Health practitioners. This will enable them to develop more effective approaches that reduce the impact of gastrointestinal infections in disadvantaged neighbourhoods. 2. European Public Health Association Conference – 2017 This presentation will be used to disseminate and discuss the early results from the analysis for paper 2 to an international audience of public health practitioners, policy makers and academics. This will enable them to make the case for investment in and development of strategies to reduce demand for health care by addressing important environmental determinants of health. It will also stimulate cross-country learning about effective approaches to reduce environmental determinants of health, leading to improved public health policies. 3. Public Health England Annual Conference and Local Government Association Conferences – 2018 These conferences will be used to present early findings from the analysis for papers 4 and 5 to audiences of public health practitioners, other local authority professionals and local government policy makers. This will enable them to make evidenced based decisions about how scarce resources are invested locally in actions to improve the social determinants of health. For example this could indicate whether investment in employment services is likely to be more or less effective than investment in services to reduce social isolation and which are likely to be the important components of these initiatives that increase effectiveness. 4. Annual Primary Care Conference - 2018. This conference will be used to present findings from Papers 6 & 7 to an audience of GPs, Commissioners and other health care professionals – demonstrating the impact of new models of out of hospital care that have been developed in the North West. This will enable other regions to learn about what works for which patient groups enabling the sharing of best practice and the improvement of health and social care services. Policy and practice briefings 1. Developing resilient neighbourhoods. This will synthesise the results from the analysis outlined for papers 4 and 5 above with other research being carried out through the NWC CLAHRC on neighbourhood resilience- including systematic reviews of the evidence and qualitative research in the intervention neighbourhoods. It will provide practical advice for local government organisations indicating approaches that are likely to be effective at promoting resilience and addressing the social determinants of health. This will lead to more effective local government policies and activities that deliver greater health benefits than would otherwise be the case. 2. New models of out of hospital treatment and care, what works for whom? This will synthesise the results from the analysis outlined for papers 6 and 7 above with other research being carried out through the NWC CLAHRC on out of hospital care including systematic reviews of the evidence and qualitative research in the intervention neighbourhoods and GP practices. It will provide practical advice for NHS and local government organisations indicating approaches to out of hospital care that are likely to be effective at reducing health inequalities and reducing demand for health and social care services. Importantly it will identify which components are likely to be particularly effective in deprived neighbourhoods and which approaches risk widening health inequalities. 3. Using neighbourhood predictive modelling to plan and target prevention. This will provide a practical guide for local government and NHS organisations to use the neighbourhood risk model developed through this project to better target resources and adapt services to local needs. This will lead to benefits through the development of more appropriate local services. Other outputs 1. Construction of longitudinal panel dataset of neighbourhood indicators with linked socioeconomic data. This dataset will be a resource that will be used by a number research projects within the NWC CLAHRC for the purposes outlined in this application. Statistical code used to develop the indicators will be made available to other researchers and the longitudinal panel dataset could also be made available more broadly for research that benefits health and social care. As outlined above where possible and following risk assessment and guidance from the HSCIC these data will be made available as Open Data. The National Institute for Health Research and the Medical Research council have recognised the need for more research that uses routine datasets such as this to evaluate the impact of public policies as “natural experiments”. This work will provide a major advance in these methods and data resources to support them leading to benefits to patients and the public through the rapid evaluation of public policies that have an impact on health. 2. Predictive modeling tool freely available to local authority and NHS organisations. The predictive modeling interface will enable local authorities and NHS organisations to better target resources and adapt services to local needs. This will lead to the more efficient and effective use of resources leading to health benefits for patients and the public. 3. Web based Neighbourhood Resilience Interface developed. The development of this freely available interface will support community groups and residents in disadvantaged neighbourhoods to identify local needs; monitor progress and advocate for change. This will lead to improved and more effective local services, it will support local community groups in making the case for funding in disadvantaged areas leading to increased investment.

Outputs:

Planned journal submissions for publications At least 8 publications in high impact peer reviewed journals are expected from this work. These are outlined below. Paper 1. The impact of gastrointestinal disease trends on health care utilisation and the extent to which these are mediated by socioeconomic and health service related factors. - Lancet Infectious diseases - January 2018 Paper 2. The environmental determinants of health care utilisation and inequalities in these effects by area deprivation - International Journal of Epidemiology - January 2018 Paper 3. The effect of changes in social care funding and welfare reform on health care utilisation 2010 and 2017; are some places more resilient than others?, British Medical Journal - January 2018 Paper 4. The health inequalities impact of initiatives to promote neighbourhood resilience. American Journal of Public Health - January 2019 Paper 5. What components of resilience have the greatest impact on health - the implications for inequalities. Journal of Epidemiology and Community Health - January 2018 Paper 6. The impact on health care utilisation of new models of out of hospital treatment and care, British Medical Journal - January 2018 Paper 7. The impact on health care utilisation of community orientated primary care, British Medical Journal - January 2018 Paper 8. Predicting adverse tends in neighbourhood health - April 2019. American Journal of Public Health. The findings from the research will be disseminated through the following Conferences Presentations: NIHR HPRU annual conference - 2017 European Public Health Association Conference - 2017 Public Health England Annual Conference - 2018 Local Government Association Conference - 2018 Annual Primary Care Conference - 2018. Policy and Practice Briefing papers The University of Liverpool will produce a series of freely available briefing papers directed at practitioners, commissioners and policy makers in local government and NHS organisations. 1. Developing resilient neighbourhoods. 2. New models of out of hospital treatment and care, what works for whom? 3. Using neighbourhood predictive modeling to plan and target prevention. Other Outputs Longitudinal panel dataset of neighbourhood indicators. The initial product of this project will be a longitudinal panel dataset of neighbourhood indicators. This will initially be used by research groups within the NIHR CLAHRC NWC as outlined above. Where possible and following risk assessment and guidance from the HSCIC these data will be made available as Open Data. Where necessary this will involve removing sensitive indicators and aggregating indicators to higher geographies to ensure anonymity is maintained. Open Data available by September 2018. Predictive modeling tool. As outlined in the analysis section for Objective 3, a predictive model will be developed that can be used by local government and NHS organisations to predict those areas that are most likely to experience adverse trends in health outcomes and health care utilisation in the future. An online interface will be developed that enables local authorities to use this model to visualise and identify high-risk neighbourhoods. This will be made freely available for use by local government and NHS organisations. January 2019 Web based Neighbourhood Resilience Interface. As outlined above, the development web based presentations of the Longitudinal panel dataset of neighbourhood indicators that will enable local groups to interact with the data, including mapping data, comparing neighbourhoods and visualising trends over time. This will support community groups to identify local needs; monitor progress and advocate for change promoting transparency and accountability. This will be freely and publically available. Developed January 2019. All outputs will be risk assessed for the potential of re-identification and will only include aggregate data with small numbers suppressed in line with HES analysis guidance.

Processing:

Step 1. Indicator development. In the first step of data processing indicators will be developed for each Lower Super Output Area (LSOA) in England from 2004-05 to 2017-18. The data request has been limited to these years as this is the minimum number of years that is sufficient to measure change over time within neighbourhoods. This process will involve a number of stages to develop robust indicators which are likely to be sensitive to socioeconomic and environmental change, national social and welfare policy changes and local health and social care redesign initiatives. Initially the University of Liverpool are developing theoretical models for the exposures and interventions being investigated. These outline the likely mechanisms through which these factors are likely to have an impact on hospital activity. As well as developing theoretical models of the impact of national socioeconomic, environmental and policy changes, Liverpool are working with local stakeholders to identify, prioritise and develop models for local NHS and Council initiatives. These will then be used to identify candidate indicators that are likely to be affected by these changes and initiatives. Indicator definitions will be developed and the data quality and precision tested. Categories will be refined and time periods pooled to provide sample sizes within each cell that give estimates that are sufficiently precise and comply with the HSCIC Small Numbers Policy / HES analysis guide. The reliability and validity of indicators will be investigated by testing the association between candidate indicators and other measures of similar constructs from different data sources. In particular, indicators will be compared to measures derived from a household health survey, which has been carried out across neighbourhoods in the North West. Indicators will then be refined in consultation with local NHS and Local Authority stakeholders. It is likely that the indicators will include measures of particular groups of morbidities (e.g chronic conditions, mental health or alcohol related conditions, accidents), some will be age specific (e.g asthma admissions in children, accidents on children, falls amongst older people), some will be limited to particular admission type (e.g emergency admissions for particular chronic conditions) some will be directly related to processes of care – e.g delayed discharge, length of stay etc). Where relevant indicators will be replicated at higher geographies and by GP practice. Step 2 – Matching and linking LSOA level data. In Step 2 data will be matched at the LSOA level to other national datasets indicating socioeconomic change, national social and welfare policy changes, environmental changes, morbidity trends and uptake of local authority and NHS initiatives. These datasets only include pseudonymised data and do not include any personal data, and linkage will only occur at the area level minimizing the risks of re-identification due to data linkage. National and local small area datasets that will be used alongside neighbourhood level indicators derived from HES data: National Datasets. • Modelled LSOA level prescribing data • LSOA population estimates • Housing overcrowding data (census) • Modelled LSOA air quality indicators for 2001, 2005, 2008 and 2012 • Crime data by LSOA • Economic activity • Self-reported health (census) • DWP statistics on the number of claimants of welfare benefits by LSOA • The number of laboratory reports for gastrointestinal infections by LSOA • Flood warning areas mapped to LSOA • Density of fast food and alcohol outlets, access to green spaces, • Housing quality indicators • Small area fuel poverty indicators. Local datasets. • Number people receiving emergency food from food banks by LSOA (local authority) • Number of people attending swimming / gym activities by LSOA (local authority) • Number of people receiving social care services by LSOA (local authority) • Number of people requesting debt/ financial/housing/welfare advice by LSOA (local authority) • Numbers accessing credit unions (local authority) • Local authority licensing data (local authority) This will result in a longitudinal panel dataset of neighbourhood indicators of hospital activity and potential determinants of health and health care use. To achieve Objective 2, LSOAs within this dataset will then be mapped to areas involved in a number of area-based interventions in the North West of England. The Collaboration for Leadership in Applied Health Research North West Coast (CLAHRC NWC) is working with the NHS, Local Government organisations and residents to prioritise existing interventions and to develop and changes those based on evidence and to evaluate their impact on health and health inequalities. These include health and social care service redesign initiatives as well as initiatives that aim to promote the resilience of local economic, social, environmental and governance systems. GP practice codes will also be mapped to groups of GP practices involved in health and social care redesign initiatives that are targeting GP registered populations rather than particular neighbourhoods. These intervention areas will then be matched with both national and regional (NW) control areas with similar characteristics, in order to evaluate the impact of these interventions on health outcomes and health service use. Step 3 Analysis. Objective 1 – Nationwide analysis. Analysis for Objective 1 will use the longitudinal panel dataset for the whole country. Longitudinal analysis methods will be used to investigate the association between socioeconomic changes, welfare policy changes, environmental changes and infectious disease trends within neighbourhoods and changes in indicators of health service utilisation. Mediation and interaction analysis will then investigate whether these effects are modified by other neighbourhood characteristics – e.g area deprivation, characteristics of the physical environment, health and social care services, local governance arrangements. Objective 2 – Evaluations of local initiatives Analysis for Objective 2 will use the longitudinal panel dataset for local intervention areas alongside data from national and regional matched control areas to evaluate the impact of interventions whilst controlling for the contextual and national trends identified through the analysis for Objective 1. Objective 3 – Predictive models. This analysis will use the findings from Objectives 1 and 2 in multivariable analysis to develop predictive models of the modifiable factors driving adverse health trends and increases in demand for health services at the neighbourhood level. These will be developed to not only identify neighbourhoods at high risk, but also to predict those areas that are most likely to experience adverse trends in health outcomes and health care utilisation in the future. Working with local government and NHS organisations the University of Liverpool will develop and evaluate approaches for the practical application of these predictive models to support the more effective use of local resources. Objective 4 - Community led approaches for monitoring progress on health inequalities at the neighbourhood level. A selection of the indicators from the aggregate longitudinal panel dataset will be developed in order that they can be made publically available as Open Data (see below controls in place to minimise risks). Working with a network of community organisations who are part of the NWC CLAHRC Community Researcher and Engagement Network (COREN), these indicators will be used to test out new community led approaches for monitoring progress on health inequalities at the neighbourhood level. This will involve the development of web based presentations of data that would enable local groups to identify local needs, monitor progress and advocate for change promoting transparency and accountability. Data governance, management and controls in place for data access and procedures to minimise the risk of re-identification. The Integrated Longitudinal Research Resource The usage of the HES data included in this request and the other small area datasets will be managed through the Integrated Longitudinal Research Resource (ILRR). The ILRR is a data management resource at the University of Liverpool established by the NIHR CLAHRC NWC in collaboration with the NIHR Gastrointestinal Health Protection Research Unit (GIHPRU) and the Consumer Data Research Centre(CDRC). The ILRR includes a dedicated Data Scientist, secure servers and robust policies for data sharing and data usage. The ILRR is overseen by a governance board, who approve access to data for specific usages based on criteria specific to each dataset. The governance board includes representatives from the NIHR CLAHRC NWC, NIHR GIHPRU and CDRC, NHS and Local government partners, a public advisor and an NHS information governance expert. Controls in place for managing access to the HES data in this request. Only ILRR data scientists based at the University of Liverpool, will have access to the record level HES data included in this request. No third party will have access to the record level data. The HES data included in this request and the panel of aggregate longitudinal neighbourhood indicators derived from that data will be consistently documented, catalogued and coded and stored in a secure SQL server database. Only aggregate data with small numbers suppressed in line with HES analysis guide will be made available to other researchers. This aggregated small area data will still be treated as safeguarded data, with specific data items only being made available to researchers as needed for specific analysis plans, with data only released after any risks of re-identification have been assessed and mitigated by ILRR data scientists. Access to the aggregated panel dataset of neighbourhood indicators will be limited to research groups that are part of the NIHR NWC CLAHRC (unless data is made available as Open Data – see below). These research groups include academic researchers from Liverpool, Lancaster and Central Lancaster Universities as well as analysts from NHS and Local Government organisations. Each group of researchers will outline a detailed analysis plan relating to each of the Objectives above, describing which aggregate indicators of hospital activity they require access to and which indicators related to socioeconomic change, national social and welfare policy changes, environmental changes, morbidity trends and those related to local area based local authority and NHS interventions. Each of these detailed analysis plans will be reviewed by the Integrated Longitudinal Research Resource (ILRR) governance board. Data will only be released only if the data is to be used according to the purposes outlined in this application. Only aggregate data that only includes the variables required for the specific analysis of each group will be released. Each request will be assessed by an experienced Data Scientist to identify if there are any risks of data being re-identified as a result of the linkage with other data sources, and to mitigate these risks. This risk assessment will be based on the procedures outlined in the Anonymisation Standard for Publishing Health and Social Care Data Specification. None of the datasets that will be used to develop linked LSOA indicators include any personal data, therefore risks of re-identification due to data linkage is low. Open data. As outlined under Objective 4 the aim is to develop a selection of the aggregate indicators derived from HES data so that they could be released as Open Data. The risk of re-identification for each of these indicators will be assessed using the procedures outlined in the Anonymisation Standard for Publishing Health and Social Care Data Specification and measures taken to ensure the level of anonymisation is low enough to allow public release. For example this could involve aggregating these indicators the ward level (average population size 10,000), rather than at the LSOA or pooling data over a number of years. The HSCIC will be consulted before any indicator is releases under the Open Government License. These Open Data aggregate indicators will then be used in work with a network of community organisations and members of the public who are part of the NWC CLAHRC Community Researcher and Engagement Network (COREN), to involve members of the public in identifying local needs, monitoring progress and advocate for change to improve services. The role of the institutions involved in these grants. The University of Liverpool (UoL) will be the sole data controller and data processor for this application and all record level data will be processed at the UoL. Only data scientists based at the UoL and employed by the UoL will have access to the record level data. The ILRR governance board that includes representative from the NIHR CLAHRC NWC, NIHR GIHPRU, CDRC and local NHS and LA organisations will oversee procedures and processes for accessing the small area aggregate level data derived from the record level data, and assess and approve requests from research groups to use this data. These research groups will only have access to aggregate datasets that have been risk assessed by data scientists at UoL and comply with HES small number analysis guidance. These research groups will include partners who are members of the NIHR CLAHRC NWC collaboration, including researchers from Liverpool, Lancaster and Central Lancashire Universities, as well as analysts from local NHS and Local Government organisations. As is required by the NIHR, the research from this project will be published in peer-reviewed journals that are compliant with the NIHR policy on Open Access.


Project 5 — DARS-NIC-19237-R3T6S

Opt outs honoured: No - consent provided by participants of research study (Reasonable Expectation, Consent (Reasonable Expectation))

Sensitive: Sensitive

When: 2017/03 — 2019/01.

Repeats: Ongoing

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC, Health and Social Care Act 2012 – s261(7)

Categories: Identifiable

Datasets:

  • MRIS - Flagging Current Status Report
  • MRIS - Cause of Death Report
  • MRIS - Members and Postings Report
  • MRIS - Cohort Event Notification Report

Yielded Benefits:

Some preliminary analysis has been undertaken utilising the NHS Digital data, which been used to develop a lung cancer CT screen nodule risk model, which is in its final stage of development but currently data from NHS Digital data has not been published/released or reported upon as the data has not matured sufficiently yet. Data received to date has been utilised to update the UKLS database to ensure University of Liverpool did not contact deceased individuals.

Objectives:

The overall aim of the trial was to provide data required for an informed decision about the introduction of population screening for lung cancer. This involved establishing the impact of screening on lung cancer mortality, determining the best screening strategy and assessing the physical and psychological consequences and the health implications of screening. An additional objective was to create a resource for future improvements to screening strategies. It was initially anticipated that the pilot study would be followed by a more in-depth extended trial with a larger cohort of people. This did not receive further funding however and data will be limited to the pilot study with a cohort of 4,061 participants, of which recruitment has now ended. Although further data is being requested under this agreement, it will be 'follow up' data for the original cohort only e.g. Cause of Death, Date of Death and Cancer Registry data. The data will be recorded on the United Kingdom Lung Cancer Screening Trial (UKLS) database and pseudonymised data given to the named researchers in order to ascertain any mortality advantage to screening and inform the UK National Screening committee. Any future sharing of record-level data would be subject to an amendment application requiring NHS Digital approval.

Expected Benefits:

ONS Mortality and Cancer Registry Data have been received on a quarterly basis since 2012. This was vital information during the conduct of the trial as any deceased participants, or those diagnosed with lung cancer, were annotated on the database and marked as "Off Study" so that the UKLS project team would not contact them again to arrange repeat scans or complete follow up questionnaires. The trial has now finished but the follow up data on deaths and lung cancer diagnosis is still required. Once the outcome data is available, the success of the screening can be evaluated. This analysis will be provided to the UK National Screening Committee (UKNSC) to help inform decision-making as to whether a lung cancer screening programme should be implemented in the UK. The UKNSC will not be using only the UKLS analysis, but will also receive analysis from a larger lung cancer screening trial run in the Netherlands (NELSON - European Nederlands-Leuvens Screening Onderzoek). The NELSON Trial is due to report in soon. A future exercise to combine UKLS and NELSON data is anticipated, but further data sharing will not be undertaken prior to approval from NHS Digital by means of a separate application.

Outputs:

Data already received from NHS digital has not had any analysis undertaken, nor any data published/released or reported upon. Data received to date has been utilised to update the UKLS database to ensure University of Liverpool did not contact deceased individuals. The initial findings/conclusions of the trial data were published in the BMJ-Thorax Online First. This included methods, trial design, recruitment, randomisation, nodule management, number of cancers, treatment, cost effectiveness modelling. The full report of these aspects of the UKLS trial has also been published by the funder, National Institute for Health Research, Health Technology Assessment Programme (NIHR HTA). Both of these are available as open publications. Future specific outputs anticipated June to December 2017 in the form of presentations/publications and peer review journals will be added to the UKLS website. One specific output will be a report to the UK National Screening Committee on the cost effectiveness and mortality benefit of introducing a lung cancer screening programme into the UK. Prior to this report it will be necessary for the statistician to analyse the data on causes of death and lung cancer diagnoses. The UKLS statistician is also designing a risk model to predict lung cancer utilising nodule data from the UKLS study and, if successful, will be submitted for publication. The most appropriate publication will be identified when the analysis is complete. This may include Epidemiological or Radiological publications, such as BMJ-Thorax. Submission to a publication does not guarantee acceptance so it may be submitted to more than one publication before being accepted. Outputs will contain only data that is aggregated with small numbers suppressed in line with the HES Analysis Guide.

Processing:

An updated cohort Excel file (containing details of participants who have given informed consent) will be sent to NHS Digital by UKLS Project Manager. The updated cohort file is as a result of removal of those participants who have died (as informed by previous data received from NHS Digital). NHS Digital will upload linked dataset file onto their secure portal and notify UKLS Lung Cancer IT technician. The file will be downloaded and saved as a password protected document into a folder. The UKLS Project Manager will update the UKLS database with deaths and cancers notified to ensure no further contact with those individuals is attempted. The Lung Cancer IT Technician will write/run queries to extract selected data from the UKLS ONS/Cancer Registry database. The output will include the pseudonymised unique patient identifier (MPI) in order that it can be linked to subject data held by UKLS. Subject data is data that has been provided by the participants as part of the trail. For those randomised to the CT screening arm, details of CT scan results are held and any treatment received as part of the trial. This data will not be linked with any other patient-level data. Researchers have access to the pseudonymised data for analysis only, which is imported into statistical software, usually SAS, STATA, or Excel. Only substantive employees of the University of Liverpool will process the data and only for the purpose as defined in this agreement. The analysis (as anonymised, aggregate data) will be the subject of publication (see specific outputs), however record level data will be viewed by the named users in this agreement only. The clinical database used within the UKLS has data for 4,061 subjects; all data is held securely (with additional password protection) and accessed only by the named users, in compliance with The University of Liverpool Data Policies. Although the database includes NHS numbers, only pseudonymised data will be made available to researchers, subjects will be identified using a pseudonymised unique identifier in any extracted data. Any analysis will be viewed by the named users in this agreement only, further data sharing beyond the named users may be required in the future however this will be requested by means of a further application to NHS Digital.


Project 6 — DARS-NIC-19805-M6T5R

Opt outs honoured: N

Sensitive: Sensitive, and Non Sensitive

When: 2016/04 (or before) — 2016/08.

Repeats: One-Off

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC

Categories: Identifiable

Datasets:

  • Hospital Episode Statistics Accident and Emergency
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Critical Care
  • Hospital Episode Statistics Outpatients

Objectives:

This application for HSCIC HES data is part of a research study funded by the Medical Research Council Hubs for Trials Methodology Research (MRC HTMR). The study is standalone and independent, but is broadly part of a wider programme of funding involving research aiming to develop the use of health informatics, including electronic medical records in prospective medical research. Data regarding patients’ primary and secondary care is routinely recorded in electronic medical records by a number of organisations including the HSCIC. Such data retrieved from electronic medical records has demonstrated utility in clinical research. Electronic medical records have an established role in providing the dataset for retrospective, observational clinical and record linkage studies. In addition, in prospective studies, electronic medical records can provide useful, additional data that can inform analyses such as the long term assessment of mortality. Although there is a precedent for the use of data retrieved from electronic medical records in retrospective clinical studies and to a lesser extent in prospective studies, there is limited evidence of the attributes of such data when accessed to measure prospective outcomes as part of a pragmatic Randomised Controlled Trial (RCT). An assessment of data retrieved from electronic medical records in the context of prospective clinical research becomes particularly relevant where such data are now being used to conduct all stages of a RCT, including recruitment, intervention and follow up assessments, despite the feasibility, agreement, additional benefit and efficiency being unclear. This study will assess the feasibility, agreement and additional benefits of data retrieved from electronic medical records in measuring the objectives of a RCT. Subsequently, the efficiency and relative value of accessing data from electronic medical records compared to collecting data using standard RCT methodology will be explored. The electronic medical records will be requested from ‘routine data sources’, primarily the HSCIC but also The Secure Anonymised Information Linkage Databank for participants resident in Wales and the General Practitioner for participants resident in the North West of England, accessed through NorthWest eHealth. The study will directly inform the methodology of the NIHR Health Technology Assessment Programme funded RCT Standard and New Antiepileptic Drugs II (SANAD II) (EudraCT No: 2012-001884-64, ISRCTN Number: 30294119). For example, accessing electronic medical records for participants of SANAD II may positively inform the health economic analyses and methods to address missing data. This will subsequently inform the methods to be performed in the final trial analyses on completion of SANAD II in 2018, including the access and implementation of data from electronic medical records. Improving the completeness of SANAD II data and precision of the analyses will positively influence health and social care by maximising the value of data collected and outcomes in this publicly funded RCT. Furthermore, the outcomes of this study will indirectly inform the methodology of similar pragmatic RCTs in the future. The specific objectives in this study where access to electronic medical records held by the HSCIC will be requested are as follows: 1. Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II: a. Assessment of the feasibility of accessing data from routine sources b. Assessment of the agreement of data from routine sources 2. Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, relevant to the aims of a RCT, SANAD II: a. Assessment of clinical efficacy b. Assessment of adverse events c. Assessment of health economic outcomes d. Assessment of the methods of addressing missing RCT data 3. Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II: a. Assessment of the efficiency of procedures to access / obtain data b. Assessment of the efficiency of procedures to format data c. Explore the relative value of accessing data from routine data sources

Expected Benefits:

There are both direct and indirect benefits to healthcare. This application for HSCIC HES data is part of a research study funded by the Medical Research Council Hubs for Trials Methodology Research (MRC HTMR). The study is standalone and independent, but is broadly part of a wider programme of funding involving research aiming to develop the use of health informatics, including electronic medical records in prospective medical research. The Standard and New Antiepileptic Drugs (SANAD) RCT is a multicentre, pragmatic RCT of worldwide significance, informing the first line use of antiepileptic drugs in clinical practice and prompting a review of national treatment guidelines. The subsequent study SANAD II (EudraCT No: 2012-001884-64, ISRCTN Number: 30294119) is on-going, opening recruitment in 2013 and will be recruiting 1510 participants’ for a duration of 5.5 years and is expected to exert a significant influence on the evidence for the treatment of epilepsy, the most common neurological disease. The study to which this application refers will directly inform the methodology employed in the data collection and analyses of the SANAD II study. The assessment of the additional benefits of data from electronic medical records, particularly with regards to the analysis of health economic outcomes and methods to address missing data, will inform the subsequent methods employed in the analyses of SANAD II. For example, if implementing data from electronic medical records provides greater benefit when addressing missing data, this data will subsequently be requested for all participants of SANAD II. This directly benefits health and social care by informing the methodology to be employed in the data collection and analyses of a NIHR HTA funded RCT, therefore maximizing the power of the data and results and the subsequent impact on patient care. This output will be measurable based on the methods subsequently employed in SANAD II. Through output via presentation and publication to the research community involved in clinical trials and clinical trials methodology, there will be indirect benefits to health and social care. The output of this study aims to inform the implementation of data from electronic medical records in prospective clinical research including RCTs. Improved knowledge of the attributes, additional benefits and efficiency of data accessed from electronic medical records will inform the design of future RCTs. Resultantly, RCTs will use electronic medical records for the objectives where a benefit is offered, over standard methods of data collection. This will result in improved efficiency (and therefore costs) of RCTs, frequently funded through public sources and improved participant experience. For example, the number of clinical trial follow up appointments may be reduced if data can be adequately collected using electronic medical records. Finally, indirectly, the assessment of data from electronic medical records in RCTs assessing treatments for epilepsy may also indicate potential utility of electronic medical records in the routine clinical monitoring of epilepsy, although this hypothesis is not being explicitly assessed.

Outputs:

Final results from this study and the associated outputs are expected by study completion (12/2017). All presented or published results will be on a strictly anonymous basis. Non-identifiable aggregate data will be used in presentations and publications with the suppression of small numbers in line with the HES analysis guide when the output involves specific clinical details. The output will consist of descriptive statistics and statistical measures of agreement between data retrieved from electronic medical records, including HES data to data collected through standard methods during SANAD II. The nature of the sample (60 included participants) results in a possibility that small numbers may be identified for data variables of interest. For example, if 5 participants experience hospital admissions or admission to critical care and data from electronic medical records provides significant benefits over standard RCT methods; this would be important to include in any output. As we are primarily concerned with the agreement and additional benefits of data from electronic medical records rather than specific clinical details, there will be no requirement to include explicit clinical details in any output. In order to present the differences between data from electronic medical records and data recorded through standard methods in SANAD II, there will be a need to highlight the availability of specific data variables. For example, outputs may present that ‘details of MRI scans were available in X number of patients’ rather than ‘X number of patients had an MRI scan demonstrating temporal sclerosis’. The exclusion of specific clinical details at record level in addition to demographic variables and geographical location for individuals involved will ensure participant anonymity is maintained; it is the available data variables and agreement between datasets that will inform the outputs of this study. In all output, aggregate data will be de-identified and all measures will be taken to ensure that individuals cannot be identified. For example during the analysis the additional benefits will be examined of assessing IMD by LSOA to inform the health economic analysis. However, in any presentation there will be no need to and the LSOA of individual participants will not be presented, but rather the aggregate results. Rare events are not expected, but if these occur and there remains any risk of identification details will be omitted from all presentations and publications and small numbers suppressed in line with the HES analysis guide. This study will have both direct and indirect outputs. In the first instance, the study will inform the analyses to be undertaken in the SANAD II RCT. Specific components will include the analysis of health economic outcomes and optimal methods to address missing RCT data. For example, if incorporating data from electronic medical records in place of traditional methods such as multiple imputation provides a more rigorous dataset, electronic medical records data will subsequently be sought for all participants’ of SANAD II and included in the analyses on completion of the trial. This output will take the form of a study report and local presentation to the SANAD II study team. This will occur on completion of the study by December 2017. Notably, all members of the team for this study are also involved in SANAD II. This study will also inform the clinical trials community, contributing to the development and improvement of efficient RCT design with the incorporation of data from electronic medical records. The output will be disseminated to clinicians and academics involved in the conduct of clinical trials and research concerning clinical trials methodology. Members of the public and non-academics may have access to the output through presentations and publications but there is no planned specific dissemination to these groups, with the exception of the participant study report that will be provided on completion of the study. This is justified as the output will be primarily informing the methodological aspects of clinical trials. Indirectly, the assessment of data from electronic medical records in RCTs assessing treatments for epilepsy may also indicate potential utility of electronic medical records in the routine clinical monitoring of epilepsy. Although not directly assessing routine clinical practice or the patients’ perspective in this study, a parallel theme funded by the MRC HTMR involves assessing patients’ perspectives with regards to clinical trials methodology, including the development of ‘core outcome sets’ for clinical trials. There are multiple objectives to this project and the dissemination of findings aims to take the following forms: - Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II: o A narrative assessment of the methods and feasibility of access will be presented at academic conferences including the International Clinical Trials Methodology Conference 2017 and Association of British Neurologists Annual Meeting 2017. o An assessment of the feasibility, agreement and reliability of data from routine sources will be presented at academic conferences and published in a peer-reviewed clinical journal. The manuscript will initially be submitted to the British Medical Journal during 2016. - Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, applied to the aims of a RCT, SANAD II: o The additional benefit of data from routine sources applied to the assessment of clinical efficacy, adverse events, health economic outcomes and addressing missing RCT data will be presented at academic conferences as above and published in a peer-reviewed journal. The manuscript will initially be submitted to Clinical Trials on completion of the study in December 2017 and if not selected for publication will be submitted to similar methodological journals. - Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II: o A narrative assessment of the efficiency of accessing and formatting data from routine sources and a discussion of the relative value of accessing data from routine sources will be presented at academic conferences and published in a peer-reviewed journal. The manuscript will initially be submitted to Clinical Trials on completion of the study in December 2017 and if not selected for publication will be submitted to similar methodological journals. Finally with respect to all study objectives, formal study reports will be submitted to the MRC Hubs for Trials Methodology, the funder of this study, to inform the wider programme of research aiming to develop the use of health informatics, including electronic medical records in prospective medical research.

Processing:

The legal gateway for the flow of data into the HSCIC is informed patient consent. This study is sponsored by the University of Liverpool and has been approved by the North of Scotland Research Ethics Service and Health Research Authority. The specific methodological activities involved in the processing of data are as follows: The SANAD II Data Manager will identify eligible participants by review of data recorded for participants enrolled in SANAD II. Eligible participants will be those aged 16 years and over, with capacity to consent and having completed a minimum of 12 months follow up in SANAD II. Participants’ date of birth, date of enrolment and consent details (to identify those with capacity to consent) will be screened. The names and addresses of eligible individuals will subsequently be retrieved. An invitation pack will be sent via the postal services containing a participant information leaflet, consent form and pre-paid addressed envelope. Informed written consent will be requested for access to identifiable data from electronic medical records for the equivalent time period in SANAD II. Organisations including the HSCIC are specifically named in the consent form. Full, explicit details of the data flows and processing activities are detailed in the consent materials and form. HSCIC feedback has been sought in an earlier application. There are approximately 70 consented participants in this study. Data from consenting participants will be requested from electronic medical records held by specific ‘routine data sources’. The HSCIC HES data will be requested for participants resident in England. Data will also be requested from The Secure Anonymised Information Linkage Databank (SAIL) for participants resident in Wales and the General Practitioners for participants resident in the North West of England. The General Practitioners will be approached by the study team and if permitted primary care data will be transcribed in the practice by the Principal Investigator. Data from consenting participants’ electronic medical records will be requested from HSCIC on an identifiable, record level basis, with individual identified by NHS Number. The rationale for this is to allow linking of data regarding an individual from electronic medical records from all routine data sources to the data collected using standard methodology as part of SANAD II, in order to compare the datasets and permit the analyses. Data will be collected for the equivalent time period the individual has been enrolled in SANAD II and will be requested on one occasion only. Data will include medical, demographic and socio-economic variables. The NHS Number (and name and date of birth if required and indicated by HSCIC) to identify the consenting participant will be securely transferred from the Clinical Trials Research Centre, University of Liverpool to the HSCIC. Subsequently, data from participants’ electronic medical records provided by the HSCIC will be securely transferred to the University of Liverpool. In both cases, data will be transferred using the HSCIC Secure File Transfer (SFT) System. The consent materials and form explicitly permits these data flows in this study. Participants data from electronic medical records (accessed through HSCIC, SAIL and participants GP’s) will be securely transferred to the University of Liverpool Clinical Trials Research Centre and linked to the data collected as part of SANAD II in order to permit the intended analyses. The SANAD II Data Manager will perform this linking and will therefore receive and access the data from electronic medical records in the first instance. Following linking, the SANAD II Data Manager will pseudonymise the complete dataset with participants identified only by their Unique Study Number. At this stage the dataset will then be accessible to the study team members involved in the analysis. Therefore, all data from electronic medical records and SANAD II data collected using standard methods will be pseudonymised to all members of the team for this study. The SANAD II Data Manager, who must perform the linkage, will have access to the demographic variables of consenting participants’ and medical data but will not and will have no requirement to access participants’ medical data for the purpose of linking. Data regarding individuals received from all sources will be linked. Therefore, secondary care data received from the HSCIC HES datasets will be linked to data collected using standard methods in SANAD II. In addition, for a small subset of participants resident in the North West of England, data retrieved from General Practitioners will be linked to both HES data and data collected during SANAD II. This process is necessary to perform a full assessment of the agreement and additional benefits of routinely recorded data (from all data sources) compared to data collected using standard methods in SANAD II. All pseudonymised study data from electronic medical records and SANAD II will be stored using the University of Liverpool Research Data Management Service’s DataStore (http://www.liv.ac.uk/csd/research-data-management/storage) at all times. Data is stored electronically on University of Liverpool central servers, located in an access controlled server room and connected to the main University network, located behind a firewall. Physical access is limited to Computer Services Department staff. Data will be encrypted using industry standard techniques meeting the Information Governance Toolkit standard (8HN20). Data will not be transferred to an additional location. The PI for this atudy will act as data custodian. The University of Liverpool Information Security Policy and Research Data Management Policy provide further information regarding data security. The pseudonymised dataset will be accessed by specific members of the study team based in and employed by the University of Liverpool. Data will then be analysed to assess the following objectives: 1. Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II: A narrative assessment of feasibility will be followed with a quantitative assessment of agreement between data from electronic medical records and data collected using standard methods in SANAD II. Agreement will be compared at the individual level. Methods to account for paired data would include Bland-Altman methods for continuous data and cross-tabulations and kappa statistics for categorical data. Subsequently, relevant outcomes of the RCT will be examined. 2. Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, relevant to the aims of a RCT, SANAD II: An exploratory analysis will assess the additional benefits of accessing data from electronic medical records. The assessment of clinical efficacy, adverse events, health economic outcomes and methods to address missing RCT data will be examined. Where linked data are available, agreement will be compared at the individual level in the first instance. Methods to account for paired data would include Bland-Altman methods for continuous data and cross-tabulations and kappa statistics for categorical data. 3. Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II: The relative value of accessing data from different sources will be discussed in the context of the prior analyses, including knowledge of the relationship between datasets and the assessment of methods of addressing missing data. The optimal ‘mix’ of data from routine sources and standard methods will be discussed. The potential impact on the data collection processes if SANAD II were to be repeated will be considered and the quantitative methods that could be used in future research proposed. All personal data in this study will be kept strictly confidential and will be handled, stored and destroyed in accordance with the Data Protection Act 1998.


Project 7 — DARS-NIC-311179-R5V5Y

Opt outs honoured: N

Sensitive: Non Sensitive

When: 2016/04 (or before) — 2016/08.

Repeats: One-Off

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC

Categories: Identifiable

Datasets:

  • Hospital Episode Statistics Accident and Emergency
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Outpatients

Objectives:

Over the years, the Roy Castle Lung Cancer Research Programme (RCLCRP) has been at the forefront of ground breaking research in early detection of lung cancer. Lung cancer is the leading cause of cancer-related death in most developed countries, with mortality rates exceeding that of colon, breast and prostate cancer combined (Jemal et al, 2010; Siegel et al, 2011). Given that more than 94% of the patients diagnosed with lung cancer in the UK die of the disease within five years, the primary objective is to detect lung cancer at an earlier, potentially more curable stage (5-year survival rate of stage IA tumour is ~70%). Lung cancer is predominantly a disease of the elderly, with an average age at diagnosis of around 60-70 years, and often presented very late at an advanced stage (Alberg et al, 2007; Dela Cruz et al, 2011). Although the pathogenesis of lung cancer is not yet fully understood, researchers have suggested the potential role of the occurrence of concomitant diseases in the aetiology of lung cancer. Due to increasing longevity and rapid ageing populations, the number of people with more than one comorbid conditions is expected to increase sharply in the coming decades (van den Akker et al, 1998; Yancik et al, 2001). This increase might lead to an increase in the incidence of lung cancer and the comorbidity burden might lead to increase overall and/or lung cancer-specific mortality. To this end, a documentation of previous history of diseases is essential for exploring the impact of comorbidity on lung cancer. A rich source of data for exploring the potential role of comorbidity in lung cancer pathogenesis is the Hospital Episode Statistics (HES). The Liverpool Lung Project (LLP) intend to link details of all admissions, outpatient appointments and accident and emergency (A&E) attendances of all participants in the LLP at NHS hospitals in England to the epidemiology data gathered through detailed questionnaire for all LLP patients. In addition, all information gathered will be linked to the ONS data to study the mortality patterns of all participants in the LLP. The in-house database system will be used to collate all data and the output of the analysis will be documented in scientific literature.

Expected Benefits:

Past Benefits 1. Report for RCLCF This report is a requirement of the funding body to ascertain that adequate outputs are produced from their financial contribution and be accountable to the general public and its trustees as to exactly how and for what purpose voluntary public funding is being utilised in the name of Lung Cancer Research. It is imperative that the RCLCF is satisfied that the RCLCRP is constantly finding and utilising all information available to it to further develop and prove the accuracy of the LLP Risk Model when used as an early detection tool for Lung Cancer. Risk Prediction models incorporating multiple risk factors have been recognised as a method of identifying individuals at high risk of developing lung cancer. Thus accurate selection of high-risk individuals for lung cancer screening requires robust methods for prediction. The LLP has produced a risk model that has been utilised for identifying high risk individuals for screening in the first UK lung screening programme. As early diagnosis can save lives, the LLP have developed a new generation of risk model, the LLPi that may assist in identifying individuals at high risk of developing lung cancer using hospital episodes as surrogates of disease history. Unlike most risk models that are based on (biased) questionnaire data, the LLPi took advantage of the available hospital episode statistics data to corroborate questionnaire data for disease status. This resulted in the LLPi with a good calibration and c-statistic of 0.85 – one of the highest in lung cancer risk modelling. The Cancer Registry, HES and ONS data being made available to the RCLCRP is a fundamental aspect in aiding the development and testing of the LLP and LLPi Risk Model and developing new biomarkers for disease detection and management. Such data also allows further co-morbidity links and contributory factors to be investigated, analysed and reported on and thus enables patient recruitment strategy development for future research and further funding to be sought. The findings of the RCLCRP enables it’s funder, the RCLCF, to develop informed policy, target fundraising and influence UK Public Health by generating information and publicity campaigns to raise awareness of lung cancer. These individuals and public health professionals can make use of this information to take decisions and advise on lifestyle choices relating to an individual’s risk of developing lung cancer particularly if they have known pre-existing conditions or persist with current lifestyle. 2. Publications Publication is fundamental to the provision of evidence-based medicine and the delivery of an effective healthcare system. For example, these publications benefit the wider community of clinicians both when investigating the possible presence of Lung Cancer within patients and potential risk of Lung Cancer developing where certain health risk pre-cursors are present in their history, whether these be actual co-morbidity diseases already present or socio-demographic health patterns. They also demonstrate how the Risk Model could be used as a tool within public health organisations for preventative/deterrent purposes when patients are advised by their clinicians to make improvements to their health or change their habits to potentially improve life expectancy. The publications can also contribute to scientific knowledge on development and validation of biomarkers used to detect or differentiate lung cancer, for example: methods for detection lung cancer; characterisation of molecular changes in cancer cells; the nature of DNA mutation and methylation as a hallmark of different lung cancer sub-types. These results are put into clinically relevant context by the data we obtain relating to other diseases (HES), the incidence of cancer amongst previously healthy recruits (cancer registry) and outcome (ONS). Our publications also highlight the need to develop drugs to improve life expectancy timelines when Lung Cancer is detected. Delineation of different molecular classes of lung cancer is contributing significantly to changes in medical practice, leading to new targeted therapies. 3. Report for EU FP7 Funded Projects (LCAOS & CURELUNG) The detailed reports of the findings and impact of research to which the RCLCRP contributed are seen by EU Commissioners and Scientific Committees to inform on the effectiveness and impact of the work, appropriate utilisation of funding and further progress required to develop for implementation of the findings. These help to guide future policy decisions on research goals and investment (including the structure of the current funding scheme, Horizon 2020). Future Benefits The establishment of the LLP case/control cohort has provided an important resources that is internationally recognised and will continue to provide benefits in the future. The ongoing update of associated medical data will further enhance the utility of this research resource, for example: 1. The molecular biomarker group within the RCLCRP aim to utilise bronchial washings and/or sputum and/or blood to develop molecular assays for early diagnosis of lung cancer. The integration of HES, Cancer Registry and ONS data with the molecular data will allow us to improve the LLP Risk Model tool for application of personalised risk assessment alongside the development of future molecular assays (either targeting those at highest risk, or attenuating the results to account for known confounding factors). 2. Characterisation of risk factors for lung cancer is of considerable health and economic importance, as they can be used to inform prevention, screening and treatment policy. The group will continue to develop the Liverpool Lung Project (LLP) risk model for lung cancer and is identifying epigenetic and genetic biomarkers for early detection and prognosis of lung cancer. 3. The United Kingdom Lung Cancer Screening Trial (UKLS) utilising the LLP risk model will be used to help refine and improve risk assessment tools, providing more efficient targeting of screening populations and other interventions. The RCLCRP and associated HSCIC data provides an important corollary to the CT screening setting and an opportunity to publish comparative studies to inform the direction of lung cancer early detection. 4. Application of HSCIC data to clinical problems provides an opportunity for training and education of the next generation of scientists and medics. Anonymised HES data will be utilised for the academic training of future PhD students and other affiliated research scientists. This will promote innovation, exploit new technologies and produce world-class scientists that will contribute to the continued development of life science research, which provides an important economic driver and improves healthcare. 5. Specific benefits to the Health and Social Care system include: the use of molecular-epidemiological risk assessments prior to clinical diagnosis and markers of pre-clinical carcinogenesis in patients with a high risk of developing lung cancer will reduce the incidence of clinically detectable lung cancer, given the appropriate intervention strategies. Early detection research provides the most cost-effective strategy for improved mortality, as treatment at an earlier stage not only provides better patient outcome, but is cheaper in the long term.

Outputs:

Past Outputs 1. Annual Report(s) for funding body, e.g. The Roy Castle Lung Cancer Foundation (RCLCF) to identify type of research undertaken, recruitment statistics and specific research developments within the funding period. This report is seen by the RCLCF Executive and Scientific Committee and its Trustees to inform policy and quantify the benefit of future funding of the research programme. 2. The RCLCRP and its collaborators have produced many peer-reviewed publications in a selection of high-ranking journals As an example, publications during 2013 & 2014 that made use of ONS, MCCR or HES data included: • Contribution to a study that examined over 500 lung tumours for DNA methylation and demonstrated a prognostic DNA methylation signature for stage I Non-Small Cell Lung Cancer (NSCLC) (Sandoval et al., Journal of Clinical Oncology 2013 [47]) • Discovery and validated a microRNA expression signature that identifies NSCLC (Bediaga et al., British Journal of Cancer 2013 [48]). • Examination of the molecular genetic profile of carcinoid cancers, implicating chromatin-remodelling genes (Fernandez-Cuesta et al., Nature Communications 2013 [49]). • Aiding the definition of a genomics-based classification of human lung tumours (Clinical Lung Cancer Genome Project & Network Genomic Medicine, Science Translational Medicine, 2013 [50]). • LLP Biobank samples helped identify a new tumour suppressor gene for lung cancer (Gkirtzimanaki et al., Proceedings of the National Academy of Science USA 2013 [51]). • The importance of risk prediction models to lung cancer screening has been highlighted (Field et al., 2013 in Lancet, Lancet Oncology and Journal of Surgical Oncology [48, 52, 53]). • We have investigated factors associated with dropout in a 5-year follow-up of individuals at high risk of lung cancer in the LLP follow-up cohort (Marcus et al., International Journal of Oncology, 2013 [54]) and looked at the impact of co-morbidity on lung cancer mortality (Marcus et al., Oncology Letters, 2013 [55]). • Genome Wide Association Studies (GWAS) and epidemiology continue to provide a useful insight into lung cancer susceptibility (TRICL, ILCCO & SYNERGY publications): - Lung cancer risk among different professions (Behrens, Occupational and Environmental Medicine, 2013 [56]; Consonni et al., International Journal of Cancer 2014 [57]). - New methods for smoking assessment in lung cancer risk (Vlaanderen et al., American Journal of Epidemiology 2014 [58]). - A pooled analysis of case-control studies conducted between 1985 and 2010 (Olsson et al., Am J Epidemiol 2013 [59]). - SYNERGY – Welding and Lung Cancer in a Pooled Analysis of Case-Control Studies (Kendzia et al., Am J Epidemiol 2013 [60]) - Analysis of the relationship between second hand tobacco smoke and lung cancer histology (Kim et al., International Journal of Cancer 2014 [61]). - Associations of risk variants for other cancers with lung cancer risk (Park et al., Journal of the National Cancer Institute 2014 [62]). • Two publications have utilised anonymous HES data: - Marcus MW, Chen Y, Duffy SW, Field JK. Impact of comorbidity on lung cancer mortality - a report from the Liverpool Lung Project. Oncol Lett. 2015 Apr;9(4):1902-1906. - Marcus MW, Chen Y, Raji OY, Duffy SW, Field JK. LLPi: Liverpool Lung Project Risk Prediction Model for Lung Cancer Incidence. Cancer Prev Res (Phila) 2015 Jun;8:570-5. N.B. References of Publications listed above can be found in SD11 – Publication References. Much of this work has also been presented at major cancer conferences (e.g. NCRI Annual UK Meeting, American Association of Cancer Research Annual Meeting and World Lung Cancer Conference). 3. Report on the LCAOS & CURELUNG Projects (EU FP7 Collaborations): LCAOS - development of a Breath Test for the early detection of Lung Cancer; CURELUNG – the (epi)genetics of lung cancer. The selection process used to identify the cohorts for these studies included a knowledge of their health status, Access to HSCIC data for these individuals provided important information of their cancer and respiratory disease history which was utilised when at the sample analysis stage. For example in LCAOS, HES information for a particular patient whose lung capacity levels were low at the time of the sample being taken and enduring breathing difficulties may be shown some 12 months later to have had hospital episode which diagnosed a lung disease which may have been present at the time of the sample being taken. For CURELUNG, respiratory disease status informed risk-stratification analysis; outcome data was used to investigate the possibility of treatment stratification based on DNA methylation. 4. The RCLCRP has established, through the Liverpool Lung Project, one of the largest prospective lung cancer case-control and cohort population in Europe (>11,500 participants) with epidemiological, clinical & outcome data and specimens incorporated into the LLP Biobank. This is a resource that has been and will continue to be utilised for a wide variety of research projects, generating additional investment and providing opportunities for exploitation of results in the form of risk prediction models, biomarkers for cancer detection, characterisation of lung disease and identification of targets for treatment. 5. The RCLCRP was instrumental in initiation of the United Kingdom Lung Cancer Screening Trial (UKLS) utilising the LLP risk model. Professor Field is the clinical investigator of the UKLS and the trial was run from the University of Liverpool Cancer Trial Unit. Future Outputs 1. Reports: Further reports for grant awarding bodies will be produced. This will include reports in support of additional funding applications for further analysis, ensuring maximum utility and benefit from the data provided. 2. Publications: It is anticipated that the analysis from this study will be included in internationally renowned oncology, epidemiology and public health journals (in keeping with our proven publication record, above). Publications will be prepared for 2015, 2016, and 2017. 3. Presentations: In accordance with previous years it is expected that presentations will be given at major cancer conferences. These presentations will provide dissemination of results from ongoing studies of LLP Risk Modelling, Methylation, MicroRNA, Sequencing, etc. Nature of Outputs The LLP project provides detailed clinical outcomes together with the patient’s epidemiological questionnaires, complemented by the excellent HES data; in depth molecular-epidemiological LLP investigations into molecular biomarker groups and DNA sequencing projects. The majority of outputs will contain aggregate data only; very occasionally individual level data will be presented (e.g. patient characteristics for tumour samples analysed), but these will be coded and completely anonymised to prevent identification. No HSCIC linked record level data will be shared directly with commercial companies or third party organisations or included in directly in any outputs. In some instances the data will consist of anonymised, characteristic data linked to a sample shared for research purposes; e.g. it may state that “the sample was from a patient of 60 years old with a diagnosis of COPD present for 10 years who was diagnosed with lung cancer at 65 years and died of heart failure aged 70 and the patient had been hospitalised for COPD on 6 occasions”. All outputs are research outputs, not commercial, although some research is undertaken within a commercial environment (e.g. pharmaceutical or life-science/biomarker companies).

Processing:

All data processing of the original HSCIC dataset will take place at The University of Liverpool and be carried out by the RCLCRP IT staff at The (UoL) APEX Building (3rd Floor).. SQL queries will be written to extract selected data from the HES database. IT staff will link the extracted data to subject data held by the Roy Castle Research Programme; any patient identifiable data fields supplied by HSCIC will not be made available to researchers. SQL is used to anonymise the data by linking them to unique patient identifier (MPI). Anonymised data are then imported into statistical software The clinical database used within the RCLCRP has data for 14,000+ subjects; all data is held securely (with additional password protection) and accessed only by trained personnel, in compliance with the University of Liverpool Data Policies. These records have NHS number and a unique identifier. These identifiers will be used to identify subjects in the HES dataset, but only the local code will be used to identify subjects in any extracted data. Additionally subsets of the data will be exported, anonymously, and used with statistical software at the University of Liverpool. Data used in the subsets relates to the health status (comorbidities), previous disease history or outcome (death, subsequent disease) of subjects who have provided informed consent and donated samples and/or lifestyle/clinical history to the LLP (RCLCRP). Data on patient identifiers or dates relating to any episode/event are not shared. The most frequent user of the data is the statistician employed on the LLP (RCLCRP) studies at the University of Liverpool, although other university researchers also have access to the anonymous data associated with participants in their studies. However, these researchers only have access to anonymous data extracted previously by the LLP (RCLCRP) personnel as part of approved research studies associated with the LLP (RCLCRP). The purpose of all uses of the data is the same (the study of lung disease) as set out in the ethically approved study documentation. The HES and ONS datasets will not be shared with a 3rd Party; extracted anonymous data will only be released to research collaborators following informed consent and ethics approval, release will be covered by Material Transfer Agreement (MTA), in accordance with local and national guidelines. The data is not accessed directly by the external researchers. Providing that subjects have consented to use by external collaborators then specific anonymous data (extracted by the LLP (RCLCRP) IT and statistical staff may be released to external researchers (typically as part of a larger dataset) following approval of a Material Transfer agreement by the study Sponsor (The University of Liverpool) and approval of the specific collaborative study by the local NRES ethics committee. All researchers using anonymous data belong to recognised research institutions or registered commercial companies covered by a Material Transfer Agreement. A list of recognised research institutions or registered commercial companies (strictly those for which the University of Liverpool RCLCRP have MTA’s in place) are listed within SD10 – LLP Collaborators. The purpose for which data will be shared within the MTA agreements is individual to each MTA/organisation with which the MTA agreement is in place and is always for research purposes. The individual level data which may occasionally be presented to one of these organisations may be for example a sample of blood or tissue with the shared anonymised data that the sample was from a patient of a particular age, who had perhaps encountered a number of episodes of hospitalisation for e.g. COPD or another condition. The data may divulge the age in years, number of hospitalisations for investigations for e.g. lung disease, or perhaps that the sample subject has a diagnosis of lung or another cancer and the number of years cancer present within the sample. Death related data would be limited to age of death or survival period from a specific treatment or diagnosis. In short an anonymised timeline of medical history may be the kind of data shared in association with the human material, but this would be devoid of dates or potential patient identifiers. The high incidence and mortality of lung cancer helps ensure that it is very unlikely that anyone would be able to identify an individual from the nature of the data presented, but care is always taken to ensure that this is the case, especially in publications (where data aggregation is the norm). Geographical (e.g. postcode) are always aggregated and provider data is not a focus of the research. Data released might include disease or comorbidity status derived from HES or outcome/death status derived from ONS along with data about the subject or samples collected by other legal means (with the consent of the subject) such as case note review. However, this is never provided with any personal identifiers or dates attached, so no link to the initial HES/ONS data or to any individual can be made by the researcher using the data supplied. Data format consists of an encrypted, password protected data file in a recognised database or statistical software file format. Data provided to external collaborators is totally anonymous and Confidentiality is governed by a number of clauses in the MTA. Under no circumstance would any third party organisation or employee (within a UoL MTA agreement) be able to link any identifiable patient data to material or data shared by UoL RCLCRP. Data is not always aggregated, but is sufficiently coded to prevent identification of individuals (data stripped of personal identifiers before use & in any representation). This de-identification meets the requirements outlined within the HES Analysis Guide March 2015. Data is often, but not always, aggregated however, even on occasion when data is not aggregated it is still compliant with the March 2015 HES Analysis guide, in particular Sections 4, 5 and 6. In a similar way to the establishment of a PSEUDO_HESID, (as stated within the HES Analysis Guide), the UoL RCLCRP MPI No. is used within the RCLCRP study when the HSCIC data is received by the HSCIC authorised IT employee and utilised by the statistician. Similarly, when samples or data is shared with any other organisation, this UoL RCLCRP MPI No. provides a link that can only be used by RCLCRP staff to integrate data. Therefore, no patient can be linked to any of the data received other than within the UoL RCLCRP by approved staff operating within the UoL data governance framework. Only those UoL employees listed to HSCIC are able access the data. At no point is any of the HES, Cancer Registry or ONS data used by UoL RCLCRP employees to demonstrate linked patterns of Hospital Admissions to Cancer rates or death statistics.