Sound File Plot

Volume Detection using NAudio

Sound File PlotI had a neat little project at work where I needed to do some audio signal processing. My work project required me to detect when someone stops speaking in recorded mp3 files.

Only from reading, whenever I think of audio and .NET I think of two libraries, NAudio and BASS. I’ve never had an opportunity to use either of them.

For this project, I started by looking at both APIs. Both are similar. However, I decided to go with NAudio for two reasons:

  • Documentation and samples are amazing
  • Mark Heath, the maintainer/creator, is very supportive

I started by doing a ton of reading (16 hours straight) on signal processing. Two books I found especially useful are:

Also, I poured over the sample applications for NAudio. Specifically, the NAudio WPF demo. there are so many cool things in there. I was able to present my boss with so many different things we could do that I think I gave him a headache.

Ultimately, the only thing I needed to figure out how to get the samples from a sound file. Samples are the peaks where, in this case, someone is speaking. The pipeline model in NAudio is especially attractive. It allows you to add your own SampleProvider into which the audio stream passes.

Mark provides an excellent example of just what I needed for this project:  SampleAggregator. This class takes your sound stream and processes it during the read method. As you read the stream from your source the sample aggregator takes the lowest and highest values for the defined number of samples:

private void Add(float value)
       maxValue = Math.Max(maxValue, value);
       minValue = Math.Min(minValue, value);
       if (count >= NotificationCount && NotificationCount > 0)
          if (MaximumCalculated != null)
              MaximumCalculated(this, new MaxSampleEventArgs(minValue, maxValue));

There’s an event which allows you to subscribe when the sample has been computed. I just store the absolute difference in my case:

  protected void OnMaximumCalculated(MaxSampleEventArgs e)
     float volumeRange = Math.Abs(e.MinSample) + e.MaxSample;

The only remaining part is to load the file and pull it through the SampleAggregator:

       using (AudioFileReader reader = new AudioFileReader(audioFile))
          SampleAggregator aggregator = new SampleAggregator(reader);
          aggregator.NotificationCount = reader.WaveFormat.SampleRate / 100;
          aggregator.MaximumCalculated += (s, a) => OnMaximumCalculated(a);
          long toRead = reader.Length;
          float[] buffer = new float[8192];
          while (toRead > 0)
              int bytes = 8192;
              int bytesRead = aggregator.Read(buffer, 0, bytes);
              if (bytesRead == 0) break;
              toRead -= bytesRead;
      return volumeSamples;

It looks just like any other stream read but with an event handler. It feels and looks very familiar. The result is a list of signal peaks which, for my sample file, can be seen in the graph.

I suggest you take a look at NAudio for your signal processing need. Some things I’ll be adding:

  • Bandpass filtration to remove noise
  • Voiceprint detection to find the number of people in the room