A paper got accepted at NeurIPS 2024 Workshop on AI-Driven Speech, Music, and Sound Generation.