ViT

[CVPR'23] All Things ViTs: Understanding and Interpreting Attention in Vision (English)

In this half-day CVPR'23 tutorial, we present the state-of-the-art works on attention explainability and probing. We demonstrate how these mechanisms can be leveraged to guide diffusion models to edit and correct their generated images.

Leveraging Attention for Improved Accuracy and Robustness (English)

This talk demonstrates how attention explainability can be used to improve model robustness and accuracy for image classification and generation tasks.

Transformer Interpretability Beyond Attention Visualization

This paper presents an interpretability method for self-attention based models, and specifically for Transformer encoders. The method incorporates LRP and gradients, and achieves SOTA results for ViT, BERT, and DeiT.