Diffusion Model Patching via
Mixture-of-Prompts

Seokil Ham*, Sangmin Woo*, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim
KAIST Twleve Labs
*Indicates Equal Contribution
Overview

Overview. We take inspiration from prompt tuning and aim to enhance already converged diffusion models. Our approach incorporates a pool of prompts within the input space, with each prompt learned to excel at certain stages of the denoising process. This is similar to giving a skilled artist an expanded color palette to refine different aspects of their artwork. At every step, a unique blend of prompts (i.e., mixture-of-prompts) is constructed via dynamic gating based on the current noise level. This mechanism is akin to an artist choosing the appropriate color combinations for specific moments. Importantly, our method keeps the diffusion model itself unchanged, and we only use the original training dataset for further training.

Abstract

We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from its dynamic gating mechanism, which selects and combines a subset of learnable prompts at every step of the generative process (i.e., reverse denoising steps). This strategy, which we term "mixture-of-prompts", enables the model to draw on the distinct expertise of each prompt, essentially "patching" the model's functionality at every step with minimal yet specialized parameters. Uniquely, DMP enhances the model by further training on the same dataset on which it was originally trained, even in a scenario where significant improvements are typically not expected due to model convergence. Experiments show that DMP significantly enhances the converged FID of DiT-L/2 on FFHQ 256X256 by 10.38%, achieved with only a 1.43% parameter increase and 50K additional training iterations.

Method: Diffusion Model Patching (DMP)

Diffusion Model Patching

DMP framework with DiT. DMP reuses the original dataset used to pre-train DiT.

Results

Main Table

Patching the pre-trained DiT models with DMP. We set two baselines for comparison: (1) conventional fine-tuning to update the model parameters. (2) naive prompt tuning. Note that we use the same dataset as in the pre-training. Image Resolution is 256 X 256.

Further Training

(Left) Further training of fully converged DiT-L/2. DMP achieves a 10.38% gain in FID-50K with only 50K iterations, while others overfit. (Right) Number of parameters. Default image size is 256x256.

ImageNet 256 x 256 Results

BibTeX


@article{ham2024diffusion,
  title={Diffusion Model Patching via Mixture-of-Prompts},
  author={Ham, Seokil and Woo, Sangmin and Kim, Jin-Young and Go, Hyojun and Park, Byeongjun and Kim, Changick},
  journal={arXiv preprint arXiv:2405.17825},
  year={2024}
}