Structure-aware HLA mismatch representations for post-transplant outcome prediction
Huanxuan Li (Shawn) · 2025 · Preprint
Motivation: Donor selection for allogeneic haematopoietic stem cell transplantation (HSCT) relies on categorical HLA match/mismatch counts that treat antigenically distinct alleles as equivalent once they share a mismatch flag. This representation discards the amino-acid-level structural divergence between allele pairs and ignores the competing nature of post-transplant events: graft-versus-host disease (GvHD), relapse, and transplant-related mortality (TRM) preclude one another and must be modelled jointly.
Methods: We introduce CAPA (Computational Architecture for Predicting Alloimmunity), a deep learning framework that encodes each HLA allele as a 1 280-dimensional vector using the frozen ESM-2 650M protein language model. Donor–recipient interaction features are extracted by a bidirectional cross-attention network (2 layers, 8 heads, d′=128), concatenated with clinical covariates and passed to a DeepHit head that jointly estimates the discrete-time cumulative incidence functions for all three competing events over a 730-day horizon. Only the interaction network and survival head (~2.8M parameters) are trained; ESM-2 remains frozen.
Results: On the public UCI Bone Marrow Transplant dataset (n = 187 paediatric patients, train/val/test 70/15/15%), we benchmark tabular-feature competing-risks models as reference comparators for CAPA. The Fine–Gray subdistribution hazard model achieves concordance indices of 0.84 (95% CI 0.69–1.00) for relapse and 0.66 (0.48–0.86) for TRM on the held-out test set (n = 29); cause-specific Cox reaches 0.75 (0.53–1.00) and 0.65 (0.46–0.85). A flat-feature DeepHit MLP performs below the classical baselines (relapse 0.67, TRM 0.41), consistent with the known difficulty of training deep survival models on small cohorts. The GvHD endpoint could not be evaluated reliably (only n = 2 GvHD events in the test set). Full validation of CAPA's ESM-2 pipeline requires a registry dataset with allele-level HLA typing, which we identify as the primary direction for future work.
Contents
Introduction
Motivation, competing-risks framing, HLA biology background, and gap in the literature.
Methods
ESM-2 encoding, cross-attention interaction network, DeepHit competing-risks head, UCI BMT dataset description.
Results
C-index and Brier score benchmarks — Fine–Gray, Cox-PH, and flat-feature DeepHit on the held-out test set (n = 29).
Discussion
Honest framing of CAPA as proposed architecture; limitations (small N, paediatric cohort, no allele-level HLA); future directions.
Supplementary
Full cohort table (n = 187), IBS results, compute/parameter breakdown, hyperparameter sensitivity analysis.
@article{capa2025,
title = {Structure-aware HLA mismatch representations
for post-transplant outcome prediction},
author = {Li, Huanxuan},
year = {2025},
note = {Preprint},
url = {https://github.com/sh4wn27/capa}
}