Researchers propose LEAF, a frontend for developing AI classification algorithms

January 25, 2021 Technology Comments Off 317 Views

In machine learning, mel-filterbanks — fixed, hand-engineered representations of sound — are often used to train algorithms that classify sound. Decades after the design of mel-filterbanks, research shows that they exhibit desirable mathematical properties for representation learning; in other words, they represent strong audio features. But the design of mel-filterbanks is also flawed by biases, and these biases can be detrimental for tasks that require fine-grained resolution at high frequencies.

In a step toward an AI-forward alternative, researchers at Google developed LEAF, a frontend that that breaks down mel-filterbanks into several components — filtering, pooling, and compression/normalization — to create audio classification models ostensibly with minimal biases. The researchers claim that LEAF can learn a single set of parameters that outperforms mel-filterbanks, suggesting it can be used for general-purpose audio classification tasks.

LEAF has real-world implications given that the global sound recognition market was valued at $ 66.5 million in 2018, according to Grand View Research. Apart from voice and speech recognition, the sense of hearing has become essential in AI; sound enables AI to understand the context and differentiate among various events occurring in an environment. For example, in case of an intrusion, an event management system with an AI-powered sound-sensing technology could turn lights on and play loud music to deter a breach, along with sending alerts to homeowners. LEAF could make it easier to create those sorts of products without having to painstakingly handcraft sound representations.

In experiments, the researchers used LEAF to develop independent single-task supervised models on eight distinct classification problems, including acoustic scene classification, birdsong detection, emotion recognition, speaker identification, musical instrument and pitch detection, keyword spotting, and language identification. They say that the models created with LEAF outperformed or nearly outperformed all alternatives or matched the accuracy of other frontends.

In the near future, the team plans to release the source code for their models and baselines as well as pretrained frontends. “In this work, we argue that a credible alternative to mel-filterbanks for classification should be evaluated across many tasks, and propose the first extensive study of learnable frontends for audio over a wide and diverse range of audio signals, including speech, music, audio events, and animal sounds,” they wrote in a paper describing their work. “By breaking down mel-filterbanks into three components … we propose LEAF, a novel frontend that is fully learnable in all its operations, while being controlled by just a few hundred parameters. [T]hese findings are replicated when training a different model for each individual task. We also confirm these results on a challenging, large-scale benchmark.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform
networking features, and more

Become a member

Let’s block ads! (Why?)

VentureBeat

Web Wad

Researchers propose LEAF, a frontend for developing AI classification algorithms

VentureBeat

About

Related Articles

Check Also

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success

Tomato.ai launches zero-shot accent softening model to revolutionize call center industry

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

What Might Fasting Insulin Predict About Health?

8 Things I Always Buy at Thrift Stores

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

Facebook brings biometric unlocking to Messenger on iPhone and iPad

11 Senior Discounts for Anyone Age 55 or Older

How to Adjust to Work-From-Home Life

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success