We proposed a novel AI framework to conduct real-time multi-speaker recognition and diarization without prior registration by learning the speaker identification on the fly. We considered the practical problem of online learning with episodically revealed rewards and introduced a solution based on semi-supervised and self-supervised learning methods in this web-based system.
Instruction: Please turn on the microphone access for this demo. You may click on the right answer as a feedback to the system to improve its agent for future prediction. When a new user is detected, a new user profile will not be added until you confirmed it by clicking "New Speaker". Refreshing the page will clear all the user profiles and restart the learning from scratch. If you have any question, feel free to email doerlbh@ with the title starting with "[VoiceID Question]", and we will get back to you shortly. Thank you! gmail.com
Note: For proper usage, please choose Google Chrome (where this demo has been tested). In early rounds, the agent tends to guess "New Speaker" a lot more often than other options because it is the only arm that receives the explicit positive feedback from the user, while the positive feedbacks for other users can only be implicitly proprogated by the self-supervision step to help with the semi-supervision.
[1] Lin, B., & Zhang, X. (2020). VoiceID on the fly: Real-time Register-free Online-learning Speaker Recognition. INTERSPEECH 2020 [proceeding] [bib]
[2] Lin, B., & Zhang, X. (2021). Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox. ACML 2021
[proceeding] [bib]
Please allow microphone access (or play this audio file to see a demo).