ASR Model Fine-Tuning Series: How to deal with Noisy Data (literally)

Published on: 2023-11-22

By Emer Butler

Since ASR models are designed to convert spoken language into written text, they often struggle when faced with audio data that has unwanted background noise.

Most audio recorded in natural settings will have unwanted background noise like nearby traffic, the wind, unwanted white noise, or the sound of background chatter.  

In this article, we will explore three common techniques used to address the problem of background noise in audio data, to help you optimize the quality of your automatically generated transcripts using our Transcribe-ASR platform or open API

Noise Reduction

The first step in dealing with unwanted background noise is by removing or suppressing it.

Let’s look at three straightforward methods or algorithms to do the job: spectral subtraction, Wiener filtering, and adaptive filtering. Yes, you read that second one right, Wiener filtering.

Spectral Subtraction

The “spectrum” of an audio file refers to the way frequencies are distributed over the file. Spectral subtraction works by identifying the frequencies associated with background noise such as traffic or wind, and distinguishes this from the frequency of the voice recorded in the audio. Then it subtracts the unwanted frequency from the audio file, which helps reduce the impact of the noise on the ASR system’s performance. 

Application: Spectral subtraction is best for removing stationary and non-stationary background noise, such as continuous hums, hisses, or consistent broadband noise. For example, when you have a recording of a person speaking in a quiet room with an air conditioner running in the background, spectral subtraction can help reduce the constant noise of the air conditioner, leaving the speech more clear.

A visual illustration of Spectral subtraction — see the clean audio wave after the subtraction?

Wiener Filtering

This is another way to clean up noisy audio through turning down the volume of unwanted noise. Imagine you’re recording something, but there’s unwanted noise like background chatter or traffic. Wiener filtering listens to the noise and tries to reduce its volume instead of subtracting the frequency entirely, like spectral subtraction above. Instead, Wiener filtering works with audio volumes. It lowers the volume of sounds you don’t want and keeps or boosts the sound you want to hear. 

Application: For example, if you’re recording someone talking in a noisy public place, Wiener filtering can make the talking clearer by reducing the noisy background chatter of a surrounding crowd or the clinking of restaurant noises.

A visual illustration of Wiener Filtering, reducing noisy background volumes

Adaptive Filtering

As the name suggests, this approach involves designing a filter that can adjust its parameters dynamically based on the characteristics of both the background noise and the desired speech signal.

Application: Adaptive filtering is super useful when you have a recording of a person speaking outdoors on a windy day, and the wind noise keeps changing in intensity and direction. Adaptive filtering can adapt to the changing noise conditions and reduce the wind noise while preserving the speech. 

An illustration of Adaptive Filtering, with the darker grey audio wave representing erratic background noise

Dealing with noisy data in ASR is crucial for achieving accurate and reliable speech recognition. By using techniques like spectral subtraction, Wiener filtering, and adaptive filtering, you can significantly improve the quality and intelligibility of audio signals, making ASR systems more effective in various applications, such as speech enhancement, echo cancellation, noise reduction, and beamforming. Choose the right method that suits your specific needs and remember that a well-tuned ASR system is essential for converting spoken language into accurate text, even in noisy environments. 

Of course, once you’ve corrected for the noise, be sure to check out a useful tool like our Transcribe ASR platform for bulk generating transcripts using the Whisper Speech-to-text model by Open AI. Our platform allows you to readily edit transcripts in bulk, or request the help of our human-in-the-loop data annotators. We help with low resourced languages, and are can support you in your AI annotation projects across specialist domains such as healthcare, legal, and the financial domain.

Sign up for a free trial! Or, contact us at [email protected] to discuss how we can assist you.