YOHO: You Only Hear Once
Published:
The recent work of Satvik Venkatesh, on the YOHO paper. In this recently published paper, we present a neural network approach for audio detection. In this paper transition points, or sonic objects, are identifed directly through the neural network design, rather than the traditional approach of block based processing of audio and performing classification per block. The traditional approach quantizes the classification of the signal, and relies on accurate classification of every time step, which can be problematic in noisy environments. In this approach, the prediction of the model is a regression, of the transition points exactly, which means the model is much less likely to oscillate, and the predictions are generally considered more robust. A rigorous review of this approach, in noise environments, was presented in a paper at NeurIPS. The full paper is available here.