[ad_1]
This work investigates pre-trained audio representations for few shot Sound Occasion Detection. We particularly handle the duty of few shot detection of novel acoustic sequences, or sound occasions with semantically significant temporal construction, with out assuming entry to non-target audio. We develop procedures for pre-training appropriate representations, and strategies which switch them to our few shot studying situation. Our experiments consider the final function utility of our pre-trained representations on AudioSet, and the utility of proposed few shot strategies through duties constructed from real-world acoustic sequences. Our pre-trained embeddings are appropriate to the proposed activity, and allow a number of facets of our few shot framework.
Support authors and subscribe to content
This is premium stuff. Subscribe to read the entire article.