The Hidden Language of Diffusion Models

This paper presents a novel interpretability method for text-to-image diffusion models. The method uses the model's textual space to explain how diverse images are generated from text prompts. Given a textual concept (e.g., "a president"), the method generates exemplar images from the model, and learns to decompose the concept into a small set of interpretable tokens from the model's vocabulary, uncovering intriguing semantic connection, biases, and more.