In recent years, Transformers have revolutionized deep learning research across many disciplines, starting from NLP and expanding to vision, speech and more. At the heart of the Transformer lies a very simple and intuitive attention mechanism. In this talk, we will explore the main milestones in Transformer-based research, and Transformer explainability research. We will take a closer look into the attention mechanism, and understand its merits and limitations in comparison to Convolutional models. Along the lecture, we will present examples with some of the most famous Transformer-based models such as BERT, CLIP, ViT, DALL-E, and DETR, and discuss the importance and applications of explainability in DNN research, and specifically for Transformers.