AnalystX
Available datasets
Health and Care sample, test, fake and synthetic data library.
The TRE service provides researhers with timely and secure access to health and care data, for further information about access, Please visit this article
For current NHS data sets and definition, Please visit this area.
Open-source Synthetic Generator Tools
Synthetic data generation - a must have skill for new data scientists
-
Artifical Data Generator
Pipelines and reusable code for generating anonymous artificial versions of NHS Digital assets in Databricks.
-
NHS England AI lab
NHS England - Transformation Directorate has created process to help with producing synthetic data. Exploring how to create mock patient data (synthetic data) from real patient data.
-
Synthetichealth / synthea
The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset.
-
The Synthetic Data Vault
SyntheaTM is a Synthetic Patient Population Simulator. The goal is to output synthetic, realistic (but not real), patient data and associated health records in a variety of formats.
-
synthpop in R
Synthpop – A great music genre and an aptly named R package for synthesising population data. I recently came across this package while looking for an easy way to synthesise unit record data sets for public release.
-
sms
An R Package for the Construction of Microdata for Geographical Analysis.
-
Faker
Faker is a Python package that generates fake data for you.
Structure activities and Electronic Healthcare Records
-
Artificial data pilot
Artificial data sets provide users with large volumes of data that share some of the characteristics of real data while protecting patient confidentiality. They are designed to model the structure of real data but are completely artificial – they do not contain any actual patient records. NHS England are piloting this new service with a limited number of artificial data sets.
-
A&E Synthetic Data
Two standard datasets have been used: A&E activity data and Admitted Patient Care data, both of which are taken from SUS data provided by NHS Digital
-
NHSBSA Open Portal
A list of NHSBSA open data set.
-
Public health data
Fingertips is a large public health data collection.
-
OpenPrescribing
Explore England's prescribing data.
-
GHDx
Global Burden of Disease Study 2019 (GBD 2019) Data Resources.
Text
-
MIMIC-III Clinical Database
MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.
-
n2c2 NLP Research Data Sets
Unstructured notes from the Research Patient Data Registry at Partners Healthcare (originally developed during the i2b2 project).
-
Diameter Health notes-gpt2
A repository of findings from GPT-2 synthetic note generation.
-
CDU Data Science team - PX text mining
Text classification of NHS patient feedback.
Images
-
MIMIC-CXR Database
A journal article describing the MIMIC-CXR database was recently published in Scientific Data. The article provides detail regarding the collection, curation, and processing done in order to create the database.
-
Oasis Open Series of Imaging Studies
The Open Access Series of Imaging Studies (OASIS) is a project aimed at making neuroimaging data sets of the brain freely available to the scientific community. By compiling and freely distributing neuroimaging data sets, we hope to facilitate future discoveries in basic and clinical neuroscience.
-
National Cancer Institute
A repository of Cancer Genome Atlas (TCGA) sample data from over 11,000 patients over a 12 year period.
-
National Institutes of Health
NIH Clinical Center releases dataset of 32,000 CT images.
-
National Cancer Institutes
The Cancer Data Access System ("CDAS") is a website where you may request data recorded from various research studies. For some studies, you may also request images or biospecimens
Other open data resources
-
NHSDataDictionaRy
The NHS website is taking an active role in making data available to the public and those interested in improving the NHS.
-
CPRD cardiovascular disease synthetic dataset
CPRD has generated a number of synthetic datasets that can be used for training purposes or to improve algorithms or machine learning workflows.
-
NHS England published data sets
A list of NHS England available data sets.
-
NHSBSA information services
A set of population data sets by NHS Business Services Agency
-
Gov.uk open data
Data.gov.uk find open data