NeurIPS24: FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation

1Singapore Univeristy of Technology and Design (SUTD),

Abstract

Recently, prompt learning has emerged as the state-of-the-art (SOTA) for fair text- to-image (T2I) generation. Specifically, this approach leverages readily available reference images to learn inclusive prompts for each target Sensitive Attribute (tSA), allowing for fair image generation. In this work, we first reveal that this prompt learning-based approach results in degraded sample quality. Our analysis shows that the approach’s training objective–which aims to align the embedding differences of learned prompts and reference images–could be sub-optimal, resulting in distortion of the learned prompts and degraded generated images. To further substantiate this claim, as our major contribution, we deep dive into the denoising subnetwork of the T2I model to track down the effect of these learned prompts by analyzing the cross-attention maps. In our analysis, we propose novel prompt switching analysis: I2H and H2I. Furthermore, we propose new quantitative characterization of cross-attention maps. Our analysis reveals abnormalities in the early denoising steps, perpetuating improper global structure that results in degradation in the generated samples. Building on insights from our analysis, we propose two ideas: (i) Prompt Queuing and (ii) Attention Amplification to address the quality issue. Extensive experimental results on a wide range of tSAs show that our proposed method outperforms SOTA approach’s image generation quality, while achieving competitive fairness.

Overview

Contributions

We examine SOTA prompt learning methods that utilize directional alignment between prompt embeddings and reference image embeddings in fair T2I generation. Our study reveals two key issues:
  • The generation of a moderate number of samples with degraded quality.
  • The potential for noisy reference image embeddings, as they may capture unrelated concepts beyond the target Sensitive Attribute (tSA), resulting in sub-optimal learning.
To address this, we propose a novel analysis framework (H2I/I2H) that scrutinizes the cross-attention maps during the denoising process of T2I generation. The analysis highlights abnormalities in the learned prompts, particularly in the early denoising steps. Based on these insights, we introduce FairQueue, an improved method that addresses quality issues while maintaining competitive fairness.

BibTeX


@misc{teo2024fairqueuerethinkingpromptlearning,
      title={FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation}, 
      author={Christopher T. H Teo and Milad Abdollahzadeh and Xinda Ma and Ngai-man Cheung},
      year={2024},
      eprint={2410.18615},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.18615}, 
}