NHS Digital Data Release Register - reformatted

Cambridgeshire and Peterborough NHS Foundation Trust

Project 1 — DARS-NIC-170100-T1Q8C

Opt outs honoured: Yes - patient objections upheld (Section 251 NHS Act 2006)

Sensitive: Non Sensitive, and Sensitive

When: 2020/09 — 2020/09.

Repeats: One-Off

Legal basis: Health and Social Care Act 2012 – s261(7)

Categories: Identifiable, Anonymised - ICO code compliant

Datasets:

  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Outpatients
  • Hospital Episode Statistics Accident and Emergency
  • Civil Registration - Deaths
  • HES:Civil Registration (Deaths) bridge

Objectives:

Cambridgeshire and Peterborough NHS Foundation Trust (CPFT) requires access to Hospital Episode Statistics (HES) and mortality datasets for use in the Lewy-CRATE project. Using the CPFT Research Database (CRATE), the Lewy-CRATE project will identify a cohort of ~1,000 dementia with Lewy bodies (DLB) cases and ~21,0000 non-DLB disease dementia controls to allow a detailed examination of their characteristics and outcomes. Following on from early work on predictors of mortality, the research team will examine potentially modifiable factors associated with shorter survival in DLB compared with other dementia subtypes. Through linking the information from the local database within CPFT with the national HES and mortality datasets, the research team will examine the patterns of early predictors, presentations and symptoms associated with DLB, providing information which may facilitate early diagnosis. In conjunction with colleagues at King’s College London (KCL) and South London and Maudsley NHS Foundation Trust (SlaM), this will inform the development and testing of a natural language processing (NLP) app which will then be investigated by the research team to determine whether it aids diagnostic decision making for clinicians in real time. The research team within SLaM have their own separate approvals and applications for completing their part of the project. This agreement relates only to the research team within CPFT. Natural language processing is the application of computational techniques to the analysis of natural language, obtaining and synthesising information from unstructured text. The Lewy-CRATE project is justified for processing data under article 6 (1) e (necessary for the performance of a task carried out in the public interest) and article 9 (2) j (necessary for reasons of public interest in the area of public health) under the GDPR. Given the aims of the project and the current high rates of misdiagnosis and incorrect treatment among patients with DLB, it is in the public interest to examine in-depth the initial presentation, diagnosis, clinical course and outcomes of individuals with DLB. The study of necessity involves patients with cognitive impairment (who might not therefore be able to consent) or (in many cases given the length of the study) who are deceased; consent is therefore not possible for all patients. To restrict to consenting patients would very severely bias and prevent the fundamental aims of the study, it would also mean the aim of investigating causes of the high mortality in DLB could not be achieved. The project has been approved by REC (Ref: 18/EE/0029) and CAG (Section 251, Ref: 18/CAG/0015) committees for all aims, methods and procedures. The Lewy-CRATE project has two main aims: 1. Using the CRATE research database, the research teams will use novel methods of identifying and anonymising case records in order to bring together and investigate the records of up to ~1000 individuals with DLB and ~21,000 non-DLB disease dementia controls. Creating a naturalistic cohort of patients with a diagnosis of DLB and non-DLB disease dementia controls that has been identified within a secondary care sample to allow detailed study of their characteristics and outcomes. By linking the information from the local database within CPFT with the national HES datasets (e.g. Outpatient, Admitted Patient Care, Accident and Emergency, mortality data), the research team will find out more about the full course of the disease from the early stages onwards, identify risk factors for disease development, and look at what might be changed to prevent the worse survival outcomes identified in earlier work. It is expected that those with DLB, compared to non-DLB disease dementia controls, will have a different profile of presentation to both psychiatric and medical secondary care NHS services before dementia was diagnosed. Also, it’s expected that outcomes following diagnosis, including mortality rates, will differ between those with DLB and non-DLB disease dementia, and that the presence of certain symptoms (e.g., fluctuation) will be associated with increased mortality. Furthermore, the research team will be able to identify modifiable factors associated with poor outcomes in DLB. In particular, that pharmacotherapy impacts on outcome with use of cholinesterase inhibitors and memantine associated with decreased and antipsychotics increased mortality. 2. The research team in CPFT will work with colleagues from KCL and SLaM to develop and test a natural language processing App to improve early and accurate diagnosis for those with DLB. DLB and non-DLB disease dementia controls identified within both CPFT and SLaM will be examined to determine the presenting patterns and clinical features of DLB. This will inform the development of the natural language processing App, which will process information from routine clinical records to identify potential DLB cases. Data from three HES datasets (e.g. Admitted Patient Care, Accident and Emergency and Outpatients) is requested, along with mortality data, and going back to 1997/98 where available. To this day, very little is understood about the presentation, natural history and clinical course of DLB. Previous research has shown that as well as greater mortality and poorer outcomes, DLB also has a distinct neuropsychological profile compared to Alzheimer’s disease (AD), despite the overlap in symptom profiles. It has been acknowledged that individuals with DLB can have more complex clinical presentations due to symptoms such as visual hallucinations and sleep disorders. These complex clinical presentations mean that individuals with DLB can initially present to an array of locations within the healthcare system, other than memory clinics. Additionally, the diagnosis of any type of dementia becomes more challenging when moving to the early stages of the disease, at which patients are increasingly presenting. By examining the clinical presentations and secondary care resource uses (including numbers and length of admissions, numbers and type of outpatient attendances, previously recorded diagnoses, numbers and types of investigations) of both a DLB and a non-DLB disease dementia control group, before their diagnosis as well as after, the research team will be able to be better inform future diagnosis and care for those with DLB. With access to such data, the research team will identify where patients with DLB initially present, the causes of such presentations, and if they are being captured and diagnosed correctly across different healthcare services. In terms of data minimisation, it is important to note that the research team are not requesting any future refreshes of the databases. Given the aims of the project detailed above, the research team are only requesting historical information. Access to HES-mortality linked data from England for the cohort is also requested. From analyses completed in the pilot study (which informed the development of the current project), it was found that individuals with DLB have significantly poorer survival, compared with individuals with non-DLB dementia. In order to confirm and build upon this finding, the research team are requesting access to HES-mortality linked as it provides greater information on the deaths of individuals who have attended or have been admitted or treated in hospitals, irrespective of whether they died in hospital or not. Such information, along with summaries about the cause of death, will allow the team to build upon the survival analysis completed within the pilot study. DLB is the second commonest cause of dementia after Alzheimer’s disease (AD), but accurate recognition and early diagnosis remain sub-optimal. Current estimates indicate that less than 4% of people with dementia are diagnosed with DLB, less than half the number predicted on the basis of pathological and epidemiological studies. For example, a pathological series indicates Lewy body pathology in up to 20% of dementia cases, whilst epidemiological studies have shown prevalence rates of 10%. There are several factors that likely contribute to the under-diagnosis of DLB, including a lack of awareness by clinicians of some of the diagnostic features, a failure to ask about these during patient assessment and a lack of appreciation of the many and varied ways in which DLB can present. Whilst symptoms like visual hallucinations are usually elicited, other core features such as fluctuation and parkinsonism may not be. However, observations may be made in case records that subjects are frequently sleepy or drowsy, or slowed in movements, or have falls or speech which is difficult to follow, which can give important clues as to a possible DLB syndrome. In addition, there may be other symptoms or clinical factors contained within hospital records which should alert clinicians to a DLB diagnosis, but which are currently not known. Improved and early recognition and diagnosis of DLB is an important outcome since accurate recognition is essential to optimise management. DLB specific symptoms including sleep disturbance, parkinsonism, fluctuation, autonomic symptoms and falls, are frequently managed sub-optimally unless DLB is accurately recognised and diagnosed. A pilot study previously conducted by the research team using the CRATE research database within CPFT identified all cases of DLB seen within the Trust over eight years (2005-2012, n = 251) and a similar number of matched Alzheimer’s disease cases (AD, n = 222), using text searching and expert clinical review to confirm diagnosis. Survival in DLB was almost half that of the comparator AD cohort despite no differences in age, sex ratio, physical comorbidity or presenting Mini-Mental State Examination (MMSE) score. This pilot study both demonstrates the power of CRATE to identify and study unselected secondary care dementia cases, and points to the need for a study using larger numbers to identify further examination of factors associated with poor outcome and increased mortality among individuals with DLB. The identification of modifiable or potentially modifiable factors associated with poor outcome would enable new therapeutic strategies to be developed and trialled to optimise the management of those with DLB. The overall project is funded by an Alzheimer’s Society Biomedical grant to the current research team and being conducted in collaboration with a research team within KCL and SLaM. As stated above this current agreement only refers to the work of the research team within CPFT. KCL/SLaM have their own separate applications and approvals for their aspects of the project. Data from the two different sites will not be linked or processed together. Data disseminated under this agreement will not be shared with SLaM or KCL. Nor will KCL/SLaM share any data disseminated under their agreement/s with CPFT. Decisions around the purpose and processing of data within CPFT will be made by the CPFT research team, and similarly within SLaM the research team within SLaM will decide how their data is processed and used. The results from both sites will inform the development of a natural language processing app for diagnosing DLB. However, this will only be after data has been aggregated for publication into groups of more than 10. The data subjects for the project contain two cohorts: one DLB cohort (~1,000) and one non-DLB disease dementia control cohort (~21,000). The data subjects have been identified through the CPFT CRATE database by examining all CPFT Trust subjects over the age of 50 years, through two main methods: 1. By examining the appropriate ICD-10 codes for individuals within the CRATE database utilising structured diagnostic fields and General Architecture for Text Engineering (GATE) diagnostic natural language processing app developed within the CRATE database. GATE is general-purpose natural language processing framework, which supports the rapid development, testing and implementation of applications and allows them to be run over very large amounts of text over a short time (Cunningham et al., 2013). 2. Furthermore, cases were identified by text mining (through SQL searches) within the free text records contained within the CRATE database with diagnostic terms, such as: Dementia with Lewy Bodies; Lewy body dementia; Cognitive Impairment (with and without Mild in front; with Vascular); Dementia (on its own and with Alzheimer’s, Vascular, Frontal, Fronto temporal, Pick’s, Semantic, Mixed); Parkinson’s; and relevant acronyms. Both methods identified a cohort of potential cases for inclusion. These cases were reviewed by clinical experts, who work with patients with DLB and other forms of dementia, to ensure that only subjects with clear and confirmed dementia diagnoses were included. Steps have been taken in order to reduce the amount of data being requested. The cohorts have been filtered according to their location, diagnoses and age. The overall aim of the project is to improve diagnosis and management of individuals with DLB. Only data within HES and ONS datasets that is relevant to the project cohorts is requested. As described above, there are two main cohorts: 1) those who the team have identified as having DLB, and 2) the non-DLB dementia disease control group. The data request has been minimised to data only relevant to these two specific cohorts for a specific diagnostic group. In terms of filtering the cohorts for data minimisation, these are individuals identified from CPFT (therefore, narrowed by geography) with a diagnosis of DLB or a non-DLB related dementia (therefore, narrowed by clinical factor) and only those over the age of 50 (therefore, further narrowed by age). In terms of episodes and fields, the research team has worked through the databases and are requesting specific items believed to be the most relevant to the project research aims, therefore further assisting with data minimisation. The items requested fall into the following board categories: 1) Demographics of the patient; 2) Admission / Length of time / Discharge; 3) Diagnosis / Cause / Procedures; 4) Provider / Referral; 5) Psychiatric (APC only); and 6) Costs. Specific items within these categories which will allow the team to examine the first presentation, clinical course and diagnosis of the cohort have been included. No maternity information has been requested as it is not relevant. The large (21,000) control group is required because the 21,000 is an amalgamation of the dementia sub-types that have been included in the non-DLB cohort for comparison with the DLB group. These non-DLB dementias include Alzheimer’s disease, vascular dementia, mixed dementia, Fronto-temporal dementia and unspecified dementia and only by including a total of 21,000 non-DLB subjects can it be ensured that there are sufficient numbers within each category to act as controls with sufficient power (based on 1:5 ratio). The whole 21,000 non-DLB cohort will not be included in future analyses, which will identify clear comparison groups among the non-DLB cohort for comparison with the DLB cohort, and many of these individual cohorts will be less than the 5000 ceiling for maximum power. A limitation of the pilot study and most previous research within this area is that it only compares DLB cases to Alzheimer’s Disease, and doesn’t consider other dementia subtypes. This is unfortunate as many of the reasons for misdiagnosis within DLB are due to the overlap in DLB symptoms with other dementia sub-types. By using a large non-DLB cohort across a range of dementia subtypes, the team will be better able to identify the early markers of DLB. A second key aim of the project is to examine the clinical course of DLB cases throughout the healthcare system compared to other dementia sub-types (including those with overlapping symptoms): DLB, unlike other dementia subtypes, does not have a clear concept of what the standard clinical course should be. For this purpose, significant control numbers across the different dementia subtypes are required. Records are needed that go as far back in time as possible as the research team need to examine the earlier history of those who are diagnosed with DLB. This is due to the prodrome in DLB, like AD, may be very long and is likely to include (as with Parkinson’s disease) problems with autonomic features such as urinary disturbance and constipation, sleep problems and sleep disturbance, falls and unexplained losses of consciousness. It has also been suggested that early presentations to hospital, particularly for “unexplained delirium”, may be more common in those who develop DLB. It is therefore very likely that people who develop DLB will have differences both in their early symptom profile and pre-diagnostic medical history compared to those who develop other dementias. It is necessary to use patient identifiable data (NHS numbers, date of births, gender, postcode) in order to link the identified cases from CRATE to their records within NHS Digital (HES databases and linked mortality data). This is necessary to meet one of the main objectives of the research project. However, the research team will not have access to any patient identifiable data. The patient identifiable data will not be used for any other part of the study as all comparisons and analyses will be completed on pseudonymised data. Linkage using patient identifiable data will be completed by the CPFT Research Database Manager (not a member of the research team and substantively employed by CPFT) through Secure Electronic File Transfer (SEFT). CPFT is the sole data controller who process data for the project as all data (local CPFT data and linked HES and mortality data) will be processed, analysed and stored within the CPFT NHS Trust database (CRATE) and all decisions about data storage and access to data are made by CPFT staff with oversight by the CPFT Research Database Oversight Committee. The CPFT Research Database (Clinical Records Anonymisation and Text Extraction: CRATE) is a CPFT database - created, overseen and administrated by the Cambridgeshire and Peterborough NHS Foundation Trust. It is not run or ‘owned’ by any other organisations. The database is overseen by a CPFT Research Database Oversight Committee including CPFT patients and doctors. It has been approved by an independent NHS Research Ethics Committee specialising in databases (REC Approval 17/EE/0442) and furthermore, approved by CPFT’s Research and Development department and its Board of Directors, for the use of pseudonymised clinical records for research. This has been completed to ensure the legal and ethical rights of patients access to data in CRATE is tightly controlled, including both the purpose and means of access. Responsibility for determining the purpose of individual applications to use the CRATE database are in line with CPFT policies and procedures lies with the CPFT Research Database Oversight Committee alone. Additionally, the CPFT Research Database Oversight Committee responsible for ensuring that the means of accessing and processing data are in line with CRATE ethical procedures and CPFT data protection and confidentiality. These include: - All applications must have a contract with CPFT, including substantive or honorary contracts (e.g. letter of access). - All patient level data must remain within the CRATE firewall and is only accessed under information security conditions detailed in CPFT’s ICT Security and other relevant CPFT ICT and Information Governance (IG) policies. - Individuals accessing CRATE must complete CPFT data protection and confidentiality training. - Access to linked data maybe subject to further constraints, e.g. users are not allowed direct access to data, project-level pseudonymisation, etc. The CPFT Research Database Oversight Committee is a CPFT committee, accountable solely to the CPFT Caldicott Committee. The CPFT Research Database Oversight Committee is not accountable to any groups, committees or individuals from any other organisations (associated with the current project or otherwise). A condition of access to CRATE, as set down by the Oversight Committee, is that all applicants must have a contractual obligation to CPFT, including an honorary contract for applicants substantively employed elsewhere (e.g. research passport for verification of bona fide researcher status and a CPFT letter of access). The purpose of this stipulation is to ensure that anyone accessing and processing data within CRATE does so as a CPFT employees, subject to CPFT policy and practice, and in line with the purposes and means determined and agreed by CPFT, including security and information governance policies. Members of the current research team hold contracts with CPFT and the University of Cambridge. This is to be expected and similar to many scenarios in NHS Trusts around the UK. CPFT is a major teaching hospital and like all teaching hospitals has developed and benefits from a close relationship with a partner university – the University of Cambridge – for both provision of clinical training and academic research purposes. Many individuals employed by the University of Cambridge, also hold contracts with CPFT, as they are also CPFT clinicians (e.g. psychiatrists, nurses, etc.), and vice versa (employees of CPFT hold contracts with the University of Cambridge). However, this does not mean that the University of Cambridge is a joint data controller of the CRATE database or the current project for all the reasons that have been described above. To clarify, it is a requirement, made by CPFT, that employees of the University of Cambridge or anywhere else, have to meet exactly the same conditions and requirements as applicants from CPFT when accessing and using the CRATE database. In all cases and regardless of where an applicant is substantively employed, the use of and access to data – the purposes and means – is determined by the CPFT Research Database Oversight Committee. Specifically, for the current project (Lewy-CRATE), the research team has made efforts to ensure that all data processing is completed in the most ethical and secure manners possible. CRATE is a pseudonymised database of CPFT clinical records. It is generated by software called CRATE (Cardinal, 2017, BMC Medical Informatics & Decision Making 17:50; PubMed ID 28441940). Pseudonymisation is by cryptographically secure hashing of the patient ID (CPFT/NHS number). Free text is de-identified by removing known identifiable information (e.g. names, dates of birth, hospital numbers, addresses, telephone numbers, nicknames, names of family members). The research team will not (at any stage) have access to personal information. The research team will only process de-identified data. The CPFT Research Database manager (who is employed by CPFT and not a member of the research team) will complete the linkages with NHS Digital and will be the only individual who processes personal information required for the linkages (e.g. NHS numbers, dates of births, etc.). The research team will access a de-identified version of the data through the CRATE database through an online portal by two-factor identification through a virtual private network (VPN) exclusively for CPFT employees. No data can be downloaded or removed from the CRATE database. Data will only be published once it has been aggregated to a minimum group size of 10. Furthermore, to date the current project has been reviewed and approved by the following committees to ensure that all procedures and methods are in line with ethical principles, information governance, and data protection and confidentiality: 1. Research Ethics Committee: East of England - Cambridge Central Research Ethics Committee. Ref: 18/EE/0029 2. Confidentiality Advisory Group: Section 251 approval for linkages with NHS Digital for HES and mortality data within the CRATE database. Ref: 18/CAG/0015 3. CPFT Research Database Oversight Committee. 4. The Guardian of the CPFT Caldicott Committee has reviewed and approved the current project. KCL and SLaM are wider research partners for the project. This agreement only refers to the work of the research team within CPFT. KCL and SLaM have their own separate approvals and agreements for their aspects of the project. No data (personal or de-identified) will be removed from the CPFT CRATE database. No data (personal or de-identified) will be transferred or linked between the two NHS Trusts or KCL. Results from the CPFT cohorts will be used to inform the development of the NLP; however, this will only occur after the data has been aggregated for publication into groups of more than 10. As described in detailed above and according to the procedures of CRATE, irrespective of where an applicant is substantively employed, the use of and access to data – the purposes and means – is determined by the CPFT Research Database Oversight Committee. Therefore, any individual (regardless of their substantive employment) who is dealing with the purpose and means of data for the current project will do so as a CPFT employee in line with the guidelines and stipulations of the CPFT Research Database Oversight Committee.

Expected Benefits:

This study will aim (i) to provide a better understanding of how DLB presents in NHS settings; (ii) to determine factors, particularly modifiable factors, associated with good and bad outcome; and (iii) to assist with the production of an DLB NLP App to facilitate diagnosis. The result will help to inform improved DLB diagnosis and new ways of optimising its management. DLB, although a common dementia, remains challenging to diagnose and little is known about its clinical presentation and outcome in NHS services. A major difficulty in DLB research has been bringing together enough individual information to draw conclusions that apply to the whole population with the condition. Using novel methods of identifying and pseudonymising case records through the CRATE database, the research team will bring together and investigate the case records of a group of ~1000 individuals with DLB. This will be completed by early 2020. By linking the records of these individuals with national databases from HES and mortality databases, the project can find out more about the full course of the disease from the early stages onwards, identify risk factors for the development of the disease and look at what might be changed to try to prevent worse outcomes. Dissemination of these findings will be done throughout 2020 and 2021. Making and communicating the right diagnosis in a timely way is key to being able to provide the right support and treatment. Improved and early recognition and diagnosis of DLB is an important outcome since accurate recognition is essential to optimise management. There are specific management pathways for DLB specific symptoms including sleep disturbance, parkinsonism, fluctuation, autonomic symptoms and falls, features that are infrequently managed optimally unless DLB is accurately recognised and diagnosed. Despite its frequency, and perhaps related to the problem of under-recognition, very little is still understood about the presentation, natural history and clinical course of DLB which hinders greater understanding of the disorder and the search for effective treatments. Much of what has been learned has been from relatively small cohorts, which are not necessarily representative of all those with DLB as they have “opted in” to studies, often run by tertiary centres with particular expertise in DLB. For example, nearly all hospital based DLB series report an increased prevalence in males compared to females. A finding not supported either by a recent systematic review of epidemiological studies or the pilot project conducted by the current research team of unselected hospital cases. The prodrome in DLB, like AD, may be very long and is likely to include (as with Parkinson’s disease) problems with autonomic features such as urinary disturbance and constipation, sleep problems and sleep disturbance, falls and unexplained losses of consciousness. It has also been suggested that early presentations to hospital, particularly for “unexplained delirium”, may be more common in those who develop DLB. It is therefore very likely that people who develop DLB will have differences both in their early symptom profile and pre-diagnostic medical history compared to those who develop other dementias. To study this, the research team need to examine the earlier history of those who are diagnosed with DLB, and in as unselected a cohort as possible. So far by working on the data within CRATE for the DLB cohort, the team have already identified risk factors which is believed are worthy of future investigation and need to be considered in the diagnosis of DLB: 1) low mood and depression have been identified as some of the most common presenting complaints by DLB cases (the rates are higher than would be expected from previous research), 2) the rates off REM sleep disorder (one of the core diagnostic features of DLB) reported is seriously below what would be expected in such a cohort, and 3) changes in weight and BMI maybe a feature on early presentation (this is currently not considered within the DLB diagnostic criteria). These findings are only based on the data that the team have access to within CRATE and have already allowed the team to disseminate results in journal articles (e.g. Journal of Alzheimer’s Disease) and at local and national conferences (e.g. ARUK, International Lewy Body Dementia Conference) throughout 2019. The team is very confident that similar results will be seen within the HES and ONS mortality data provided by NHS Digital. As already mentioned, studies like this one that have such large DLB cohorts are incredibly rare. To the best of the team's knowledge, this project has created one of the largest DLB cohorts in the world. It is believed very strongly that given the size of the cohort and the information included in the local CRATE database and the national data in NHS Digital, that the research team will achieve the outlined benefits and outputs. More information about the symptoms and causes of mortality within DLB will be essential to understand the ways in which individuals with DLB present to the NHS, so that better ways of diagnosing and managing the condition can be developed. Ways of better managing DLB are urgently needed, since the mainstay of current treatment consists of the use of modest symptomatic treatments and dementia support. Along with the efforts outlined above to communicate with clinical, scientific and public groups, the research team plan to collaborate with other National Institute of Health Research (NIHR) funded units, through a network of researchers called the D-CRIS initiative, to develop and test computer based NLP methods to improve early and accurate diagnosis for those with DLB. With the successful development of an NLP app to better identify people with DLB there is the potential to develop a regularly updated ‘live feed’ of new individuals likely to have this diagnosis. This method could be used in two ways: firstly, to flag to clinicians that a patient under their care may have DLB (and thus improve rates of diagnosis and guide the best management plan); and secondly, to identify individuals who might be eligible for relevant research studies in order to give them the opportunity to participate in these. The development of the NLP is ongoing and will continue until September 2020. At present, a bespoken DLB NLP algorithm is in development and being tested by colleagues in King’s College London. Furthermore, a journal article has been submitted for review based on this DLB NLP and should be published within a couple of months following successful review by the journal article. All aspects of this project have the potential to make a positive impact on patient care and DLB research within a short time from publication (throughout 2020 and 2021). The findings from the DLB and non-DLB cohorts will contribute to a growing body of research that aims to improve identification of DLB (and at earlier stages in the disease) and help with developing better services to meet the particular needs of this group. Once the findings have been disseminated throughout 2020 and 2021, clinicians will have access to diagnostic, symptomatic and clinical course information for one of the larges DLB cohorts in the world, aiding with improved and timely recognition and diagnosis of DLB. The findings from the NLP development will allow services both locally and within the D-CRIS network to improve rates of identification of DLB and to flag patients who could receive better personalised care and participate in research studies to benefit potentially not only themselves but the population with DLB as a whole. Overall, the project will have a number of benefits for the lives of people affected by dementia: 1. By learning as much as possible about the presentation and course of DLB in real clinical practice. 2. By findings ways to better identify patients with DLB and identify them earlier in the disease course. 3. By developing a method of identifying people with DLB who might wish to participate in research studies, leading to future improvements in the diagnosis and management of DLB.

Outputs:

The following outputs will be produced throughout 2020 and 2021: • Peer reviewed scientific journals • Internal and progress reports • Conference presentation • Publication on websites, media articles/press releases • Feedback to patient and public involvement (PPI) groups All outputs will contain only data that is aggregated with small numbers suppressed in line with the HES Analysis Guide. For further clarity, all outputs will contain only data that has been aggregated into groups of more than 10. In order to communicate the findings of the project to clinical and research communities, the research team will publish the findings in peer reviewed scientific journals (e.g. general medical journals such as The Lancet or BMJ and specialised dementia journals such as Dementia and the British Journal of Psychiatry) and submit contributions to and participate in meetings attended by dementia experts across both scientific and clinical fields (e.g. Alzheimer’s Society UK Conference, International Lewy Body Dementia Conference). Interim (annually) and final reports will be prepared and submitted to the relevant funder (Alzheimer’s Society), REC committee and CAG committee. These will be covering all progress to date, planning and implementation, and results and findings throughout the project. The research team will feed back findings more widely to patients and carers by meeting with local (e.g. CPFT dementia group) and national (e.g. Alzheimer’s Society) groups and using publications such as the Alzheimer’s Research UK and the Join Dementia Research newsletters. The research team has regular contact with two Research Network Volunteers from the Alzheimer’s Society throughout the project. Since the beginning of the project, the Research Network Volunteers have provided feedback on ethics applications and project presentations, and two meetings have taken place between the Volunteers and the project research teams (March 2018 and January 2019). A third meeting will be organised with the Research Network Volunteers in 2020 in order to disseminate the main findings of the project. The research team will also continue its contact with the Dementia Carer’s Support Service within CPFT and the CPFT Service User and Carer Research Group. This includes working with the User and Carer R&D Manager for CPFT, getting feedback from the group on the project protocol, ethics applications and PPI events. Contact will be maintained with all PPI groups throughout 2020 in order to communicate all results and best methods for communicating to wider dementia and public groups. Furthermore, the team will seek to engage with the media locally and nationally to disseminate key findings and use social media to provide regular updates on progress. CPFT has a research communications lead who works actively with researchers to promote ongoing work. Members of the research team are currently and will again in the future provide information on the project in three main places: 1. The main patient-facing website of Cambridgeshire and Peterborough Mental Health Trust; 2. The Alzheimer’s Society website, which is widely used on a regular basis by the families of people with dementia and by some people with dementia; 3. The website of the Lewy Body Society, which again is widely used by the families of people with Lewy body dementia and some people with Lewy body dementia. Currently, the main patient-facing website of Cambridgeshire and Peterborough Mental Health Trust contains detailed information about the study and contact details for how individuals can opt-out of the study. In future the research team will publish descriptions of the final outcomes of the project, with links to where additional information can be found. A briefer version of the study outcomes along with a link to the CPFT website has been posted on the Alzheimer’s Society and Lewy Body Society websites. These are the two most popular and widely used websites by people with dementia and Lewy Body dementia, and their carers. In the future, the research team will prepare briefer versions of the project outcomes and findings to disseminate to these groups. This project will provide detailed information about DLB compared to other non-DLB disease dementia controls, including its early presenting features, prognosis and mortality. These factors will inform the development of diagnostic criteria for early or prodromal DLB, allowing cases of DLB to be identified at the earliest possible opportunity. It will also facilitate recruitment of people with early DLB to clinical trials through better identification. The research team in King’s College London (KCL) and South London and Maudsley NHS Foundation Trust (SLaM) will carry out similar processes for cohort identification and data linkage as outlined above in section ‘5b. Processing activities’ for the CPFT research team. The results of the above listed outputs will be amalgamated from both NHS Trusts to inform the development of a natural language processing (NLP) app. Natural language processing information extraction Apps are programs that take unstructured text (only that included within CRATE: no unstructured text is received from NHS Digital) as input and, through analysing both the syntactic locations of words within the text and their underlying semantic meanings, bring back information required by the user. The desired aim is for use of the NLP apps in real time to allow clinicians to have information fed back to them directly by flagging cases with a high likelihood of a DLB diagnosis. The intention is for results from the current project to be used to develop future projects around the development and implementation of a DLB NLP within e-clinical record databases that will assist with the timely diagnosis of DLB.

Processing:

The linkages with NHS Digital for the project will be completed through the following data flow: a) The research team as employees of CPFT have identified ~1,000 DLB cases and ~21,000 non-DLB disease dementia control cases within CRATE (which provides access to de-identified, pseudonymised data (REC: 17/EE/0442)). The research team have identified these cases based on their diagnosis of DLB or non-DLB dementia, and only have access to pseudonyms. The research team will pass pseudonyms (RIDs, research identifiers) to the CPFT Research Database Manager. b) CPFT’s Research Database Manager (an employee of CPFT and not a member of the research team) will re-identify these patients in bulk for the sole purpose of extracting NHS numbers and identifiers (dates of birth, postcode and gender) to send to NHS Digital. The research team will not be given the identifiers. c) RIDs (pseudonyms), NHS numbers, postcodes, dates of birth and gender for the selected cohorts will be sent from CPFT to NHS Digital by the CPFT Research Database Manager through Secure Electronic File Transfer (SEFT). d) NHS Digital will send data from the HES datasets and mortality data to CPFT with only the RIDs (pseudonyms). e) CPFT and NHS Digital data are linked on the shared identifier (RIDs) within CRATE by the CPFT Research Database Manager. The linked data are therefore de-identified by pseudonymisation within CPFT. (*) f) The research team for the project are given access to the pseudonymised data within a project-specific database within CRATE. They may analyse it only within CRATE. No patient-level data will leave the NHS secure computing environment. (*) g) Researchers may only publish data aggregated to a minimum group size of 10. (*) (*) As set out in CPFT’s Research Database protocols; NHS Research Ethics 17/EE/0442. All outputs, including outputs shared with KCL/SLaM, will contain only data that is aggregated with small numbers suppressed in line with the HES Analysis Guide. The CPFT Research Database Manager will use the research identifiers to link the data from the HES datasets and mortality data to the local data available within CPFT about the cohorts. This will be completed using the research identifiers (pseudonyms). As outlined above, identifiable data will be used by the CPFT Research Database Manager to complete the linkages with NHS Digital. The identifiable data will not be used for any other purpose within the project. NHS Digital will receive the identifiable data from the CPFT Research Database; however, they will not send the identifiable data back to the CPFT Research Database Manager. NHS Digital will send back the data from the HES datasets and mortality data with research identifiers (pseudonyms) only. Furthermore, no attempt will be made by the CPFT Research Database Manager and the research team to re-identify the data. The CPFT Research Database Manager will create a project-specific database within CRATE for this linked data using only the research identifiers (pseudonyms). Only members of the CPFT research team will have access to this project-specific database through the CRATE database within the CPFT NHS secure computing environment. All patient-level data will have been pseudonymised via CRATE (REC Approval 17/EE/0442), operating under a REC-approved opt-out model with appropriate information governance, security, audit, and oversight controls. The data will not be made available to any third parties other than those specified except in the form of aggregated outputs with small numbers suppressed in line with the HES Analysis Guide. All processing of data will be completed by members of the research team, as employees of CPFT, in line with the protocols and ethics of the CPFT CRATE database (REC Approval 17/EE/0442). All members of the research team are employees of CPFT either through direct contracts or honorary contracts (e.g. a research passport to verify bona fide researcher status and a letter of access issued by CPFT). This is standard practice within CPFT, controlled and monitored by the CPFT Research & Development department. All members of the research team have completed compulsory CPFT training in data protection and confidentiality. Lay summary: As the key aims of the current project are to examine the diagnosis, clinical course and outcomes of a large DLB cohort in comparison to a range of non-DLB disease dementia controls, researchers will examine a wide range of variables including (for example): date of first contact with the healthcare system, presenting compliant, number of hospital admissions, treatments received during admissions, date of death and length of time from diagnosis to death. This will use analyses that allowing researchers to compare these variables between DLB and non-DLB cohorts (e.g. comparing the frequencies and lengths of events). Furthermore, the team will examine if the frequencies and length of these events impact on outcomes for DLB cases compared to non-DLB cases (e.g. Do DLB cases have more admissions to hospital? If so, does the higher number of admissions predict their shorter survival times?). More details for these analyses are include below. Analyses will be completed in line with the stated research aims (detailed examination of initial presentation, diagnosis, symptoms, clinical course and outcomes of the DLB cohort in comparison to other non-DLB disease dementia controls). The research team will use univariate and multivariate methods and regression analyses as appropriate. Building on the work completed within the pilot study, the team will conduct in-depth mortality analyses, with additional predictors included from the HES and mortality data. Furthermore, the inclusion of additional predictors and outcomes from these databases across the DLB and non-DLB disease dementia control cohorts will allow the research team to conduct advanced regression analyses (e.g., logistic regression for disease outcomes, cox regression for the effect of predictors upon the length of disease progression/outcome). For the natural language processing app, diagnostic accuracy will be determined using standard metrics (precision, recall, positive and negative predictive value and overall accuracy) for both development (test) and validation cohorts. This study will build a cohort of ~1,000 DLB and a non-DLB disease dementia control cohort, providing >80% power to detect differences in symptom frequency of 3-6% (the difference detectable depends on baseline frequency) between groups. NHS Digital reminds all organisations party to this agreement of the need to comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by “Personnel” (as defined within the Data Sharing Framework Contract i.e.: employees, agents and contractors of the Data Recipient who may have access to that data).