TCR-BERT: learning the grammar of T-cell receptors for flexible antigen-binding analyses

Published in bioRxiv, 2021

Recommended citation: Wu, K., Yost, K.E., Daniel, B., Belk, J.A., Xia, Y., Egawa, T., Satpathy, A., Chang, H.Y. and Zou, J., 2021. TCR-BERT: learning the grammar of T-cell receptors for flexible antigen-xbinding analyses. bioRxiv. https://www.biorxiv.org/content/10.1101/2021.11.18.469186v1.full.pdf

T-cell receptors (TCRs) are critical for mediating our immune systems’ ability to recognize and respond to invaders. However, these TCR sequences are also extraordinarily diverse across individuals, making it challenging to build supervised classifiers that generalize well across different individuals. We present TCR-BERT, a large language model of TCR sequences; we demonstrate that TCR-BERT achieves state-of-the-art performance and generalizability across a wide range of TCR analyses and may even enable computational design of new TCR sequencees targeting specific antigens.

Download paper here