University of Washington

Research — UW Reality Lab

Speech Separation

Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region , given an angle of interest and angular window size . By exponentially decreasing , we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm also allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state of the art performance for both source separation and source localization, particularly in high levels of background noise.

The Cone of Silence: Speech Separation by Localization Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), pp. 20925-20938. 2020. PDF: https://arxiv.org/pdf/2010.06007.pdf

Classes

List of classes taken

CSE 505: Principles of Programming Languages [Spring 2020 by Prof. Zachary Tatlock]

CSE 517: Natural Language Processing [Winter 2020 by Prof. Noah A. Smith]

CSE 521: Design and Analysis Algorithms [Autumn 2019 by Prof. Shayan Oveis Gharan]

CSE 550: Computer Systems [Autumn 2020 by Prof. Kurtis Heimerl]

CSE 576: Computer Vision [Spring 2020 by Prof. Steve Seitz]

Last updated: Jan 17, 2022