Scientific question
Do molecular readouts (expression + simple clinical context) co-vary with imaging-derived features in a way that separates sample groups?
What we would conclude with this
CCA finds linear combinations of each modality that maximize cross-correlation—useful for joint visualization and hypothesis generation before heavier multimodal models.
Synthetic data
80 samples; 10 “genes,” 3 imaging metrics, age and sex; a hidden axis drives both blocks. Seed: 42. See demos/scientific-multimodal-biology/data/generate.py.
Approach
Standardize blocks; two-component CCA (sklearn.cross_decomposition.CCA); scatter canonical scores colored by group.
Key outputs

Reproduce
cd demos/scientific-multimodal-biology
python3 data/generate.py
python3 src/run.py
Dependencies: demos/requirements.txt.