LXMERT

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (Oral)

The paper presents an interpretability method for all types of attention, including bi-modal Transformers and encoder-decoder Transformers. The method achieves SOTA results for CLIP, DETR, LXMERT, and more.