Fine-tune LLaVA for a Medical Task

Introduction

  • Vision language models (VLM) are great for a variety of tasks
  • for certain reason avoid proprietary models
  • public facing VLMs
    • seem confident
    • closed proprietary models have unknown post-training processes, which hinders research into explainability, generalizability and effective evaluation (essentially we are on one eye blind)
  • instead take the effort of fine-tuning a VLM yourself and calibrate it based on your own medical expertise
  • in healthcare domain it is more efficient to build a specialist model with very distinct capabilities rather than teach one model everything
  • for wide practical adaptation fine-tuning should be a commodity task and easy to do even without an entire team of ML engineers available (as is often the case in Academic medical centres)
  • let’s see an example of how to fine-tune a open-source VLM architecture on custom medical data

Llava

alternatively fine-tune SAM

https://github.com/microsoft/LLaVA-Med

https://medium.com/ubiai-nlp/how-to-fine-tune-llava-on-your-custom-dataset-aca118a90bc3

The dataset

either TotalSegmentator or MSD (Medical Segmentation Decathlon) https://github.com/wasserth/TotalSegmentator

http://medicaldecathlon.com/

CHECK WHICH DATASET IS EASIER TO LOAD (THEY ARE BOTH EQUALLY NICE AND WELL KNOWN)

Fine-tuning

https://github.com/haotian-liu/LLaVA/blob/main/docs/Finetune_Custom_Data.md

https://github.com/efenocchi/torchtune/blob/feat/deeplake-v4/torchtune/datasets/_utils.py

https://ubiai.tools/how-to-fine-tune-llava-on-your-custom-dataset/

Give it a try