The MAD Challenge

Challenge offered by ff123 on May 30, 2001

 

17 June, 2002: Challenge has been met at 1.1% significance level by Garf, who performed 122 trials total (!) and got 74 of them correct.

 

April 23, 2002: ABX requirement modified to eliminate "cherry picking," but the bar was otherwise significantly lowered. Now, a confidence level of only 95% is required.

 

MAD stands for MPEG Audio Decoder, whose signature feature is that it supports decoding at 24-bits and also at 16-bits after dithering or noise shaping. The audible benefits of dithering are said to be in extra resolution or dynamic range when compared with 16-bit output that is merely truncated or rounded to the closest bit. For a peek into what dithering looks like at the least-significant bit, visit my page:

Dithering in MP3 Decoders or visit David Robinson's page, "What do these 24bit decoders sound like?"

Nawhead performed a blind test of MAD (24 bit resolution) vs. the Fraunhofer decoder (16 bit resolution) in Winamp 2.7. The results convinced me that he is hearing a real difference between decoders on certain types of music -- at 24 bits vs. the 16-bit default decoder within Winamp..

MAD also offers an attenuation feature to reduce the audible effects of clipping distortion, which I have tested for myself. I believe this feature actually does work as advertised.

The only feature of MAD which I have not seen demonstrated to my satisfaction is the benefit arising from the dithering to 16 bits. I have tried blind tests of various samples for myself and have so far not been able to demonstrate that I can hear a difference between MAD at 16 bits and the Fraunhofer decoder within Winamp versions 2.666 and above. I administered a blind test on Usenet as well (alt.binaries.sounds.mp3.d), and included Nawhead in that test, but with negative results.

Here is my challenge:


Rationale behind the method chosen for the challenge

A blind test eliminates any unconscious bias that might arise from a sighted test. The requirement of repeated, correct identifications between the two decoders greatly reduces the probability that a difference between decoders is falsely claimed if guessing is the actual dominant factor in listener preference.

Choose your sample

1. Compare the latest version of the MAD Winamp plugin (as of May 30, 2001, version 0.13.0b) against the default Fraunhofer plugin within Winamp 2.666+.
2. Choose any mp3 sample you wish, of any length, bitrate, and encoder type.
3. The only restrictions on the mp3 sample is that it must not contain any instances of clipping anywhere, and that it must have been ripped from CD without normalizing. One can verify that the sample does not contain clipping by playing it back using the MAD decoder (first disable the automatic attenuation) and then looking at the Attenuate statistics.

Decode to WAV

4. Decode to wav using the WaveOut plugin of Winamp. I have already verified that what gets written to file is exactly what goes out to the soundcard.
5. When decoding to wav, MAD must have its auto-attenation feature disabled.
6. Disable all other plugins, such as equalizers and other programs which perform post processing. In Winamp, to turn off the equalization, make sure it's turned off from within the equalization window itself. The button on the main panel is just to open up the equalization window -- it doesn't turn the equalization on or off!

Perform a blind test

7. Using Arny Kreuger's PC-ABX program to perform a blind listening test of the decoded wavs, correctly identify X to a confidence of at least 95%. You can use my ABX calculator at http://ff123.net/abx/abx.html to calculate the confidence that any particular result is not the result of chance (p-value < 0.05). For example, if you perform 5 trials and correctly identify X all 5 times, that would give you a p-value of 0.03, or a confidence of 97% that your result was not just by chance. There is one requirement for performing multiple trials, though: you must count all trials! So it's no fair to perform 100 trials, and then "cherry pick" 5 in a row somewhere in those 100 trials to claim that 97% confidence was achieved. My recommendation is that the moment you achieve 95% confidence, you should stop and claim victory.

Listening hints:

A. Break up your listening sessions over a period of time so that each one is short, perhaps only 15 minutes at a time. Concentration is hard to maintain in long listening sessions. See how Cameron Bobro successfully performed Ethan Winer's Bit-Depth listening test to distinguish truncation from dither at 16 bits.
B. Train yourself before jumping right into a test at 16 bits. The second half of this page describes some samples I have created to train your ears for what dithering/truncating sounds like at lower bit resolutions. This training may help you out for your listening test at 16 bits.
C. Listen early in the morning or late at night when it's quieter. Early morning has the added benefit that you'll probably be fresh and rested. Reduce any extraneous noise to a minimum.

Contact Me

D. I hang out at the HydrogenAudio forum. Leave a post for me to find there or private message me. If you have a particular sample you used, I will probably want to listen to it and post it to my Test Samples site for others to play with. I'm also interested in what type of soundcard, headphones (or speakers) you used, and amplifier, if applicable. I will post the responses of everybody who passes such a blind test.


Training Sample

David Robinson made a sample available to me. It is a 30 second sample of a Telarc recording of Rachmaninoff's Second Piano Concerto. It has very low background noise and may be a possible candidate in revealing a difference between the MAD decoder and the FhG decoder.

I encoded this sample using Lame 3.88b --abr 256 -q0 and decoded to a 24-bit text file using l3dec. Then I converted to a 24-bit wav which Cool Edit can process using my conv2wav utility. From within Cool Edit, I could then convert down to any bit resolution. For bit resolutions of 13, 14, and 15 bits, I chose to convert in two different ways:

Dithered conversion: dither down from 24 bits using triangular PDF dither with a dither depth of 1 bit and no noise shaping.
Truncated conversion: convert to a lower bit resolution without enabling dithering.

At 16 bits, I used the MAD Winamp plugin version 0.13.0b to produce the dithered conversion and the default Fraunhofer decoder from within Winamp 2.73 to perform a rounded conversion.

Also at 16 bits, I dithered down to 16 bits using Naoki Shibata's ssrc program, version 1.28. This program noise-shapes the dither, so that hiss is pushed out of the areas where human hearing is most sensitive (for example, near 3 kHz). In fact, the shape of the dither spectrum looks like the typical curve for the ear's Absolute Threshold of Hearing (ATH).

Here is a summary of the files I produced, which are available on my Audio Samples Page.

Filename Description
rach_original.flac original file from Telarc Sampler disk ("Brief Encounter")
rach2.mp3 encoded with Lame 3.88b --abr 256 -q0
rach2_13bits_dither.flac decoded from rach2.mp3 at 24 bits; converted to 13 bits with 1-bit triangular dither, no noise-shaping
rach2_13bits_truncate.flac decoded from rach2.mp3 at 24 bits; converted to 13 bits with no dither
rach2_14bits_dither.flac decoded from rach2.mp3 at 24 bits; converted to 14 bits with 1-bit triangular dither, no noise-shaping
rach2_14bits_truncate.flac decoded from rach2.mp3 at 24 bits; converted to 14 bits with no dither
rach2_15bits_dither.flac decoded from rach2.mp3 at 24 bits; converted to 15 bits with 1-bit triangular dither, no noise-shaping
rach2_15bits_truncate.flac decoded from rach2.mp3 at 24 bits; converted to 15 bits with no dither
rach2_16bits_dithermad.flac decoded from rach2.mp3 using MAD 0.13.0b decoder. Dithers to 16 bits
rach2_16bits_roundfhg.flac decoded from rach2.mp3 using FhG decoder in Winamp 2.73. Rounds at 16 bits
rach2_16bits_ditherssrc3_ref.flac decoded from rach2.mp3 at 24 bits; converted to 16 bits with Naoki Shibata's ssrc version 1.28, using noise-shaped dither type 3 (follows ATH curve).

Note: If you want to compare either the MAD or FhG decodes against a 16-bit reference, you should use rach2_16bits_ditherssrc3_ref.wav, not the original file, rach_original.wav. This eliminates the mp3 encoding itself as a variable.

For your reference, at a volume which does not sound too loud, I can detect the difference between the 16-bit reference and the 14-bit dithered sample in an ABX test (15 correct trials out of 16). I can also detect the difference between the 16-bit reference and the 13-bit truncated sample. However, I am not able to conclusively detect a difference between the 16-bit reference and the 14-bit truncated sample (best ABX run was 13 correct of 16). The difference I heard in each case was a difference in hiss. I could not hear any qualitative difference in the sound of the piano or in the background ambience (apart from the hiss). I used a Soundblaster AWE 64 Gold soundcard with Grado SR325 headphones connected directly to the line-out RCA connectors. During my testing, all fans in my computer were disabled. I believe my detection capability is being limited by the inherent noise of the soundcard.

Addendum: On June 9, 2001, I repeated the listening test, this time with my newly installed M-Audio Audiophile 2496 soundcard. I bypassed the mixer and ran the audio straight out to the RCA jacks, where I connected up my headphones without amplification. Bypassing the mixer means that I'm getting full amplifcation out of the soundcard, but since this sample is at such a low level, it sounds like normal volume to me. This time (compared with the reference) I easily heard the difference in noise with the 14-bit dithered sample and didn't even bother to ABX. I also detected a difference in hiss using the 15-bit dithered sample (16 of 16 ABX trials) and also now conclusively hear a difference with the 14-bit truncated sample (ABX results: 15 of 16). However, with the 15-bit truncated sample, I failed to detect a difference. At 16 bits, I can't tell the difference between any of the three (reference, MAD, or FhG).


The following are listening comments by Hans Heijden. Comments in bold are mine. Note that there was an earlier version of the samples I made available which were derived from an mp3 made by Fraunhofer's FastEnc codec at CBR 256. The initial comments refer to these samples. Also, subsequent to Hans' listening test, I have changed the reference 16-bit file to use Naoki Shibata's dithering routine instead of Cool Edit Pro's. But since Hans never heard a difference at 16 bits anyway, this is probably moot.

Using Terratec X-fire, mid-q amp and HD580.

Going alphabetically, step by step from 13 bits_truncate to 15 bits_round, I hear the hiss getting less.
The step from x bits_dither to x bits_round decreases hiss a lot, while the decrease from x bits_round to x+1 bits_dither is only just audible.

If I had a less noisy amp I surely could hear further. What is the purpose of this test anyway, just the hiss? I listened at a 'normal' volume. That is enough to hear the amplifier's hiss: I never hear absolute silence, with headphones, from any amp set at normal volume. No difference heard at all among the three 16 bitters.

Very shortly afterward, Hans bought a Terratec EWX 24/96 soundcard.

Now after all it seems most hiss in the test came from the X-fire, not the amp. I can easily hear the 16 bit samples have the least hiss. But now I hear the hiss masked some strange 'swirling' distorsion during the quieter moments, like at the start. Guess it's just the 16 bit limit?

It turns out that the 'swirling' distortion was caused by the Fraunhofer FastEnc encoder. Hans found this out after noticing that after I had recreated the samples from an mp3 encoded with Lame 3.88b --abr 256 -q0, the swirling noises at 16 bit resolution were gone! The increase in hiss is probably an indication that it is more faithful to the original, since the Lame encode retains more of the higher frequencies. Here are his comments on the new (current) samples.

The rach2 15 and 16 bit samples do not have that swirling sound I mentioned. But instead they have more hiss. Again the three 16 bit samples sound the same to me. Though I don't know what you did exactly, it must be that the swirling were mp3 artifacts.

Again. of the rach 'generations', I never heard a difference between the three 16 bit samples.

The swirls are well audible in [the original version] 16 bit, and just audible in 15 bit round, at the first note. They appear weaker  to me in 15 bit round, not buried in the slightly stronger hiss.

On my old soundcard I did not hear the swirls, then the hiss must have been strong enough to mask them. If you simply amplify the sample by 400%, you'll hear it too probably.


The following are comments by Gian-Carlo Pascutto (Garf, or GCP):

I started with the 15 bit clips, which were not very hard to ABX against each other, after some initial training. Training included, my score was 43/70 (p = 0.035). First ones 15/30 mostly guessing, then locked on to the difference, after that 28/40.

The 16 bit clips were harder. I first tried rach2_16bits_dithermad against the ditherssrc3 clip. I did a long trial, trying to find a good spot to listen for a difference. I locked on to a very slight 'fft' sound somewhere between 7.5 and 10 seconds. I had a long session (again about 70 trials), which got to <3% confidence somewhere along the way, but my ending score was only <16%. (I forgot the exact ABX results), so I guess that doesn't count.

I rested a little, and tried the roundfhg clip. I scored 8/30. I'm not sure what to make of that. It's unusual to score that bad :-/ I gave this clip another try but the result was basically random (I quit at about 15/30).

Next I tried the MAD and the FhG decodes against each other. This seemed to produce a rather audible difference, and I scored 9/10 on my first try. (After what happened with the dithermad vs ditherssrc3 test, I didn't dare test further).

I had another try after that and scored 16/32 exactly, thank you very much.

This was my normal setup (SB PCI128->HD570), with the difference is that I attenuated the noise from the computer by putting pillows around it :)

I put the volume quite a bit higher than I normally would. Because the clip was quiet, this was listenable, but I wouldn't use it for normal listening.

After another session:

...Just had a listen on fresh ears (again with the muffled computer and increased volume). I got 49/80.

So

9/10 + 16/32 + 49/80 = 74/122

which according to my own significance calculator is <1.1%.

There really isn't any difference in casual listening. In my test setup, I could hear differences in the background noise structure when listening very closely. IIRC, the MAD decode was a bit 'sharper' but also a bit more noisy.

A real 24-bit audiophile quality card would have helped of course (got to get me one of those someday), although the SB 128 is pretty good. It's cheap and it's got no fancy features, but it's quite linear and the S/N ratio is good.

 

Return to ff123's Home Page