SLT Conference | Beilong Tang

I am very honored to attend SLT 2024 hosted in Macao.

There are many fasinating works on discrete tokens. Keynote spekaer Prof. Wenwu Wang shared his research on SemanticCodec. It has a high accuracy on the first layer, making it potentially more suitable to various downstream decoder-only tasks where the autoregressive-model outputs the first layer only.

My current reaserch bottleneck is how to reconstruct audios using discrete tokens from Kmeans on WavLM. I talked to one faculty and he gave me two papers to review: one is BASE TTS: Lessons from building a billion-parameter text-to-speech model on 100K hours of data, and the other is Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. He stated that reconstructing audio with only Kmeans input is impossible because the discretization process losses too much speaker information. I should focus on using auxilary information to reconstruct speech in the next stage. I also attended the great talk on Personalzized Speech Enhancement by Prof. Minje Kim from UIUC.

I also met the famous Professor Hung-yi Lee from National Taiwan University, and I took a photo with him.

(of course, the person on the left is me :) )

This is the group photo of our lab members:

(From left to right: Mingjing Yi (Undergraduate), Beilong Tang (Undergraduate), Ming Li, Zexin Cai(Postdoc at JHU), Qishan Zhang(RA))