DMP

Diffusion Model Patching via
Mixture-of-Prompts

AAAI 2025

▶ KAIST ▶ Twleve Labs
^*Indicates Equal Contribution ▶ Twleve Labs

Abstract

We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from its dynamic gating mechanism, which selects and combines a subset of learnable prompts at every timestep (i.e., reverse denoising steps). This strategy, which we term "mixture-of-prompts", enables the model to draw on the distinct expertise of each prompt, essentially "patching" the model's functionality at every timestep with minimal yet specialized parameters. Uniquely, DMP enhances the model by further training on the original dataset already used for pre-training, even in a scenario where significant improvements are typically not expected due to model convergence. Notably, DMP significantly enhances the FID of converged DiT-L/2 by 10.38% on FFHQ, achieved with only a 1.43% parameter increase and 50K additional training iterations.

Results

Evaluating pre-trained diffusion models with different further training methods. Importantly, we use the same dataset as in the pre-training for further training. We set two baselines for comparison: (1) full fine-tuning to update the entire model parameters. (2) naive prompt tuning.

ImageNet 256 x 256 Results

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = golden retriever (207)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = goldfish (1)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = hummingbird (94)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = “ostrich” (9)

@article{ham2024diffusion, title={Diffusion Model Patching via Mixture-of-Prompts}, author={Ham, Seokil and Woo, Sangmin and Kim, Jin-Young and Go, Hyojun and Park, Byeongjun and Kim, Changick}, journal={arXiv preprint arXiv:2405.17825}, year={2024} }

Diffusion Model Patching via
Mixture-of-Prompts

🔥TL;DR DMP is a simple method to enhance pre-trained diffusion models using the same dataset via prompt tuning.

Abstract

Method: Diffusion Model Patching (DMP)

DMP framework with DiT. DMP reuses the original dataset used to pre-train DiT.

Results

Evaluating pre-trained diffusion models with different further training methods. Importantly, we use the same dataset as in the pre-training for further training. We set two baselines for comparison: (1) full fine-tuning to update the entire model parameters. (2) naive prompt tuning.

(Left) Further training of fully converged DiT-L/2. DMP achieves a 10.38% gain in FID-50K with only 50K iterations, while others overfit. (Right) Number of parameters. Default image size is 256x256.

ImageNet 256 x 256 Results

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = golden retriever (207)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = goldfish (1)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = hummingbird (94)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = “ostrich” (9)

BibTeX

Diffusion Model Patching viaMixture-of-Prompts

🔥TL;DR DMP is a simple method to enhance pre-trained diffusion models using the same dataset via prompt tuning.

Abstract

Method: Diffusion Model Patching (DMP)

DMP framework with DiT. DMP reuses the original dataset used to pre-train DiT.

Results

Evaluating pre-trained diffusion models with different further training methods. Importantly, we use the same dataset as in the pre-training for further training. We set two baselines for comparison: (1) full fine-tuning to update the entire model parameters. (2) naive prompt tuning.

(Left) Further training of fully converged DiT-L/2. DMP achieves a 10.38% gain in FID-50K with only 50K iterations, while others overfit. (Right) Number of parameters. Default image size is 256x256.

ImageNet 256 x 256 Results

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = golden retriever (207)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = goldfish (1)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = hummingbird (94)

Uncurated 256×256 DiT-XL/2+DMP samples. Classifier-free guidance scale = 1.5. Class label = “ostrich” (9)

BibTeX

Diffusion Model Patching via
Mixture-of-Prompts