Abstract
We propose an algorithm to improve multi-concept prompt fidelity in text-to-image diffusion models. We start from a common failure: prompts like "a cat and a clock" sometimes yield images where one concept is missing, faint, or colliding awkwardly with another. We hypothesize this happens when the model drifts into mixed modes that over-emphasize a single concept pattern it learned strongly during training, while the others are weakened. Instead of retraining, we introduce a corrective sampling strategy that gently suppresses regions where the joint prompt behavior overlaps too strongly with any single concept's dominant pattern, steering generation toward "pure" joint modes where all concepts can co-exist with balanced visual presence. We further show that existing multi-concept guidance schemes can operate in unstable weight regimes that amplify imbalance; we characterize favorable regions and adapt sampling to remain within them. The approach is plug-and-play, requires no model tuning, and complements standard classifier-free guidance. Experiments on diverse multi-concept prompts show consistent gains in concept coverage, relative prominence balance, and robustness, reducing dropped or distorted concepts compared to standard baselines and prior compositional methods. Results suggest that lightweight corrective guidance can substantially mitigate brittle semantic behavior in modern diffusion systems.