This talk takes a deep dive into Attend-and-Excite. The paper presents a method to guide text-to-image diffusion models to generate all subjects in the input prompt, to mitigate subject neglect. This is achieved by defining an intuitive loss over the cross-attention maps during inference without any additional data or fine-tuning.
In this half-day CVPR'23 tutorial, we present the state-of-the-art works on attention explainability and probing. We demonstrate how these mechanisms can be leveraged to guide diffusion models to edit and correct their generated images.
This talk demonstrates how attention explainability can be used to improve model robustness and accuracy for image classification and generation tasks.
The paper presents a method to guide text-to-image diffusion models to generate all subjects in the input prompt, to mitigate subject neglect. This is achieved by defining an intuitive loss over the cross-attention maps during inference without any additional data or fine-tuning.
This talk is an intro talk to DNNs and attention, targeted at DL beginners. The talk was given as part of a volunteering program to encourage women to consider research in the deep learning field.