I am a Research Scientist at Adobe Research. I received a PhD in Computer Science and a master in Music at University of California San Diego (UCSD), where I was fortunate to be co-advised by Prof. Shlomo Dubnov and Prof. Taylor Berg-Kirkpatrick.
Before that, I earned my Bachelor's degree in Computer Science at Fudan University, co-advised by Prof. Wei Li and Prof. Gus Xia. Previously, I also interned at Adobe, Mitsubishi Electric Research Laboratories (MERL), Apple, Bytedance AI Lab, and Tencent.
My main interest lies in Audio and Music Representation Learning, and its application to downstreaming tasks. It is the inter-disciplinary among music, general audio, and computer science. Main research I conduct during my PhD study is listed below:
- Audio and Music Information Retrieval
- Contrastive Language-Audio Pretraining
- Audio Event, Music, and Speech Source Separation
- Audio Classification
- Singing Melody Extraction
- Music Recommendation
- Music Generative AI
- Text-to-Music Generation via Latent Diffusion Models
- Controllable Music Generation via Variational Auto-Encoders
- Multitrack Music Generation via Transformer Models
Highlighted Projects with regard to the above research I lead or serve as a main contributor:
- CLAP: Contrastive Language-Audio Pretraining
- HTS-AT: Hierarchical Token-Semantic Audio Transformer
- MusicLDM: Text-to-Music Generation
- Zero-shot Audio Source Separation
- Music SketchNet: Controllable Monophonic Music Generation
- POP909: A Dataset for Pop Music Arrangement
- TONet: A Singing Melody Extraction Framework
- Choral Music Separation
I am the website maintainer of China Conference on Sound and Music Technology and New Interfaces for Musical Expression. My alias "Knut" comes from one of my favorite composers Knut Nystedt.