Attention-based models, and specifically Transformers, have revolutionized the field of text processing and are becoming increasingly popular in computer vision, speech, and multi-modal tasks. In this talk, we will discuss the advantages of using such models, and the techniques that enable training these models. Additionally, we will explore various groundbreaking applications for attention-based models, such as GPT-3 and DALL-E.