Synthetic patient data

motivation

in real world settings healthcare data is often incomplete and corrupted
at the same time data is the foundation of AI/ML
Q: is there a possibility to still use incomplete data for e.g. model training?

data imputation is a standard technique in ML to make incomplete data compatible for any kind of tractable problem solving (e.g optimization, analytical)
for multi-modal or longitudinal data this require advanced statistical models (e.g. deep models)
current approaches in digital health focus on building synthetic electronical health records (EHR)

The dataset

i am still looking for this dataset that I recently saw where they modelled blood values over the course of a pregnancy

The algorithm

Code run-down