Or how to implement sound_like RSpec matcher
The problem I’m trying to solve in this article is comparison of two audio files. We’ll figure out how to verify that they sound similar.
I was developing an application that has a deal with audio processing and I had to write a test to verify outcome audio file matches a one from fixtures. Well, I’ve decided to compare audio binaries like these:
And it worked!
But soon my colleagues let me know I had broken the build. It turned out
outcome.mp3generated on their Mac books didn’t match
generated on my linux laptop, despite the fact that both sounded
absolutely the same. Probably we had different codecs.
So I had to come up with a better idea.
Audio fingerprints and Chromaprint
After some investigation I found a term “audio fingerprint” or “acoustic fingerprint”, it was exactly what I was looking for. From Wikipedia:
An acoustic fingerprint is a condensed digital summary, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database
It’s used by services like Shazam to identify songs.
So I started looking for open source implementations and found Chromaprint - a C library that calculates audio fingerprints from raw audio files. It seemed to be simple, with good source documentation and easy to get started.
Integrate Chromaprint with Ruby
1 2 3
According to Chromaprint’s documentation a raw fingerprint is an array of 4 byte integers. But how to compare to 2 fingerprints to detect similarity?
The answer was to calculate Hamming distance from binary representation of fingerprints. Again according to Wikipedia:
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.
To calculate Hamming distance for binary data we need to apply XOR operation and count number of 1 in the result.
Here is a small example for 2 byte values:
dec bin 11737 00101101 11011001 27129 01101001 11111001 XOR 01000100 00100000 Hamming distance is 3
Basing on this I implemented an additional method
that calculates similarity in range from 0 to 1.
Create RSpec matcher
Now I could compare raw audio data, but in real world almost always we have to have a deal
with compressed audio like mp3 or ogg. However wav files contain exactly raw audio data.
So I could convert compressed audio to wav, then read it to get raw audio and
calculate fingerprints for comparison. To convert audio I prefer using
command line tool, it’s pretty powerful.
I have to explain that I did it all to avoid having a deal with audio codecs within ruby, since it would make things be much more complicated.
Finally I got
sound_like RSpec matcher:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Note that I used threshold with value 0.95 because quite rare fingerprints have 100% match.