Synthetic patient data

motivation

  • in real world settings healthcare data is often incomplete and corrupted
  • at the same time data is the foundation of AI/ML
  • Q: is there a possibility to still use incomplete data for e.g. model training?
  • data imputation is a standard technique in ML to make incomplete data compatible for any kind of tractable problem solving (e.g optimization, analytical)
  • for multi-modal or longitudinal data this require advanced statistical models (e.g. deep models)
  • current approaches in digital health focus on building synthetic electronical health records (EHR)

The dataset

  • i am still looking for this dataset that I recently saw where they modelled blood values over the course of a pregnancy

The algorithm

Code run-down