Publications | Debottam Dutta

Generative Modeling and Sampling

ICML WorkshopTILT: Test-Time Reward Alignment via Distribution Tilting for Compositional Generation

Debottam Dutta, Jaehoon Hahm, Jianchong Chen, and Romit Roy Choudhury

ICML 2026 Workshop on Structured Probabilistic Inference & Generative Modeling (SPIGM)

PDF

tl;dr: We introduce TILT, a training-free framework that improves compositional text-to-image generation by framing it as a test-time reward alignment problem. Rather than relying on external reward models, we define a "pure-mode" intrinsic reward that favors samples where all concepts are jointly present and show that the reward maximizes Conditional Total Correlation in principle.
ICLRSteer Away From Mode Collisions: Improving Composition In Diffusion Models

Debottam Dutta, Jianchong Chen, Rajalaxmi Rajagopalan, Yu-Lin Wei, and Romit Roy Choudhury

ICLR, 2026

PDF Code Website

tl;dr: While text-to-image models can generate stunning visuals, they frequently fail at multi-concept prompts by dropping or overshadowing weaker elements. We hypothesize this happens because the joint probability distribution overlaps too heavily with single-concept distributions, pulling the generation toward a dominant concept. To resolve this without retraining, our Concept-Contrasting Corrector (CO3) steers sampling away from these overlapping areas and toward "pure" modes where all concepts achieve a balanced visual presence.
ICMLPersonalized Image Generation via Human-in-the-loop Bayesian Optimization

Rajalaxmi Rajagopalan, Debottam Dutta, Yu-Lin Wei, and Romit Roy Choudhury

ICML, 2026

PDF Website

tl;dr: MultiBO enables precise image personalization via human-in-the-loop Bayesian optimization. It narrows the generation gap by observing that even when language-based prompting reaches its limits, humans can still visually identify if a new image x^+ is closer to their imagined target x^* than previous attempts. Iterative multi-choice user feedback is then used to guide the diffusion model to the exact desired image without retraining.
PreprintLearning Energy-based Variational Latent Prior for VAEs

Debottam Dutta, Chaitanya Amballa, Zhongweiyang Xu, Yu-Lin Wei, and Romit Roy Choudhury

arXiv, 2025

PDF

tl;dr: To tackle the fundamental "prior hole" problem that severely degrades VAE generation quality, we introduce EVaLP, a flexible energy-based prior designed to tightly align with the aggregate posterior. Our key insight is utilizing a variational sampler network to approximate the log-normalization constant, which effectively bypasses computationally expensive MCMC sampling methods. By successfully bridging this prior-posterior mismatch, our approach enables both stable model training and fast, high-quality sample generation.
Neurips WorkshopMulti-Source Music Generation with Latent Diffusion

Zhongweiyang Xu, Debottam Dutta, Yu-Lin Wei, and Romit Roy Choudhury

Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation

PDF Demo

tl;dr: We introduce the Multi-Source Latent Diffusion Model (MSLDM) to resolve the noisy artifacts and poor melodies of waveform-level models by compressing individual instrumental sources into distinct VAE latents. Our key insight is that training a diffusion model on these concatenated "source latents" captures inter-source harmony significantly better than modeling whole music mixtures, leveraging this compression and noise-robustness to enable highly flexible, high-quality total and partial generation of mutually coherent tracks.
ICASSPEstimating Multi-chirp Parameters using Curvature-guided Langevin Monte Carlo

Sattwik Basu, Debottam Dutta, Yu-Lin Wei, and Romit Roy Choudhury

ICASSP, 2025

PDF

tl;dr: Estimating higher-order multi-chirp parameters in low signal-to-noise environments is a challenging non-convex optimization problem where standard samplers frequently fail to reliably converge. To address this, we proposed a Curvature-guided Langevin Monte Carlo (CG-LMC) algorithm that adaptively tunes Gaussian smoothing using the objective function’s average curvature to reliably reach the optimal solution.

Speech/Audio Processing and Digital Health

Speech Dereverberation With Frequency Domain Autoregressive Modeling

Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, and Sriram Ganapathy

IEEE/ACM Transactions on Audio, Speech, and Language Processing2024

PDF
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection

Debarpan Bhattacharya, Neeraj Kumar Sharma, Debottam Dutta, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, C Chandrakiran, Sahiti Nori, K K Suhail, Sadhana Gonuguntla, and Murali Alagesan

Sci. Data2023

PDF
The Second Dicova Challenge: Dataset and Performance Analysis for Diagnosis of Covid-19 Using Acoustics

Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, and Sriram Ganapathy

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)2022

PDF
Analyzing the impact of SARS-CoV-2 variants on respiratory sound signals

Debarpan Bhattacharya, Debottam Dutta, Neeraj Sharma, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, and Murali Alagesan

Proc. Interspeech 20222022

PDF
Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection

Debottam Dutta, Debarpan Bhattacharya, Sriram Ganapathy, Amir Hossein Poorjam, Deepak Mittal, and Maneesh Singh

Proc. Interspeech 20222022

PDF
A Multi-Head Relevance Weighting Framework for Learning Raw Waveform Audio Representations

Debottam Dutta, Purvi Agrawal, and Sriram Ganapathy

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)2021

PDF