Added Clipping

Notes by ff123

 

Addendum (7/15/01): Robert Hegemann and David Robinson pointed out to me that added clipping occurs only when decoding, even though the encoder plays a role. If a scheme such as David Robinson's Proposed Standard for Replay Level is implemented in various audio formats, added clipping should cease to be a problem, no matter how the files were encoded.

Is added clipping distortion audible (as distinguished from existing clipping distortion), as introduced by the process of encoding/decoding near-fullscale files to mp3? I performed the following listening test to find out. There are two arguments commonly put forth for why added clipping results from encoding/decoding near-fullscale mp3 files:

(1) Quantization errors may cause the signal to clip, and
(2) Gibbs phenomenon through the limitation of bandwidth may cause the signal to clip

 

For an explanation of (2), refer to David Robinson's post, which I reproduce on my MAD Listening Tests page. Also see his followup on the VQF forum, which I reproduce here. David talks about why near-fullscale unclipped samples can clip in CD players. The phenomenon he describes is applicable to any process which employs filtering (i.e., encoding to mp3):

Re: digital mastering levels - why near full scale unclipped samples can clip in CD players.

It's not because sounds don't get attenuated so much when oversampled - it's because the actual sample points on CD may not represent the analogue peak amplitude. Do these boards support images? Apparently not. I'll give a link then.

http://privatewww.essex.ac.uk/~djmrob/mp3board/norm.gif (Note: link has gone away)

This shows an 11.025kHz tone, sampled at 44.1kHz. Depending on how the sample points line up with the tone (something you have no control over when you're digitising a real signal!), they could fall on the peaks, but they could equally easily fall at almost half the peak amplitude. Now consider a mastering engineer who does a digital normalisation of that signal, pushing those sampled points to peak at full scale. The first set of samples will remain the same. However, the second set of samples will be boosted greatly - those numerically smaller samples will be pushed to full scale.

Now, when you replay this on a CD player that oversamples, the oversampling filter fills in the gaps between the original samples with new samples corresponding to the original waveform. In the first case, this poses no problem. In the second, the new interpolated samples will have amplitudes above digital full scale. Few devices can cope with this, so the oversampled signal clips at full scale, losing the top of the waveform. All this happens even though the actual sample points on the CD didn't clip.

Real signals aren't so contrived, so things aren't as bad as this most of the time! However, by heavily compressing audio signals in the digital 44.1 sampled domain, you make this clipping on playback more likely.

How is this relevant to audio coding? If you take a signal that is heavily compressed to full scale, and try to filter it (lots of filtering is involved in mp3 encoding and decoding) the resulting signal can easily go over full scale. Here's an example...

Download this file (it's only 80k)

http://ff123.net/samples/noise.wav

Note the peak amplitude. Now filter it - try something like a low pass filter at 18kHz. So you're removing energy, right? Now look at the new peak amplitude - it's higher than before, even though you've removed part of the signal.

A simlar thing, to a lesser extent, is causing clipping to occur in mp3 files.


Short Description of Listening Test

Note: when I originally devised this test, I had thought that clipping could be caused by the encoding process alone, and I tried to come up with a way to eliminate the decoder from the process. However, if you read through the process, you'll discover (as David Robinson pointed out), that I made a mistake and left the decoder in the loop at the wrong place:

(original WAV -> mp3 -> decode to WAV -> reduce gain of WAV file)

The proper way to eliminate the decoder would be as follows:

(original WAV -> mp3 -> reduce gain of mp3 file -> decode to WAV)

There are utilities which can reduce the gain of mp3's without having to decode to WAV. I have decided to let the notes stand as is, however, because it shows that clipping is caused by the encoding/decoding process, if not solely by the encoding process.

For a listening comparison, the files must be at the same volume. The basic idea behind the listening test is to compare two files which have both been normalized -- one before encoding/decoding and the other after encoding/decoding. If the encoding/decoding process does not add any clipping distortion, the two files should be audibly indistinguishable from each other.

Selection of Musical Sample

I chose a heavy metal selection: the "Liberate" track from Slipknot. This track was discussed by two listeners (Nawhead and bAdDuDeX) in alt.binaries.sounds.mp3.d with reference to clipping. I excerpted 5 seconds of this track, and indeed, the volume level appears to be crammed as close to full-scale as the mastering engineers could get. Click on thumbnail image below to enlarge:

I have made the three .wav files I reference in this test available from my Audio Samples Page. They have been compressed using the lossless compressor, FLAC. So you can perform the whole process or just the listening test for yourself.

liberate.flac original file
ref.flac normalized before encoding/decoding
sample.flac normalized after encoding/decoding

Normalizing before encoding/decoding

To find the level at which no possible clipping occurs, I encoded to mp3 using mp3enc31 at 128 kbit/s, then played it back using the MAD decoder (0.12.3 beta 4). This decoder allows one to see and adjust the amount of attenuation needed to avoid clipping during playback. It was an iterative process: i.e., find out how much attenuation is needed using MAD (for example -1.8 dB), perform a normalization of -1.8 dB in Cool Edit on the original file {-1.8 dB = 10^(-1.8/20) = 81%}, run it through MAD again, find out it wasn't quite enough attenuation, renormalize, etc.

Cool Edit performs its gain transformations (normalizing) at resolutions greater than 16 bits, and dithers back down to arrive at the final 16-bit result. To make sure that this dithering was not affecting my results, and also to double-check myself, I performed the whole shebang twice, from start to finish.

I finally ended up normalizing the clip by 2.2 dB (77%) before MAD showed no possible clipping anywhere in the encoded file. After decoding back to .wav format (see section below), I called this file the reference file (ref.wav).

Encoding/Decoding

I chose mp3enc31 at 128 kbit/s as the encoder. The higher the bitrate, the less incidence of clipping occurs in the final decoded file, as analyzed by Cool Edit. For the decoder, I chose the Fraunhofer, ISO-compliant one found in Winamp 2.666. MAD is also ISO-compliant, but bAdDuDeX and Nawhead have reported a higher incidence of clipping in MAD than with the Fraunhofer decoder when using mp3enc (see my MAD Listening Tests page). I used the Disk Writer plugin from within Winamp to write the decoded output to file. I have verified that the Disk Writer found within Winamp 2.666 produces files which are identical to the bitstreams produced by the Nullsoft waveOut plugin (see my page Playback of mp3 files in Winamp).

Normalizing after encoding/decoding

The other file (which I called sample.wav) was arrived at by encoding straight from the original .wav file, with no normalization. I normalized to 77% only after decoding back to .wav. Note that both ref.wav and sample.wav are at the same volume level, which is extremely important for a comparative listening test.

Differences between ref.wav and sample.wav as shown by Cool Edit

Cool Edit can display the number of possible clipping instances. Comparing ref.wav to sample.wav, the following statistics are shown:

ref.wav
sample.wav

The Percent Clipped field shows that ref.wav is virtually free of possible clipping instances compared with sample.wav.

 

Listening Test

I performed an ABX test of sample.wav against ref.wav. I used Grado SR325 headphones. In the right channel about 1 second into the clip, I can hear a distortion which is slightly more noticeable in sample.wav than it is in ref.wav. Granted, it is very subtle, but definitely audible. I attribute it to clipping incurred during the mp3 encoding/decoding process. I scored 15 out of 16 correct identifications of X the first time, and 15 out of 16 correct the second time through (after recreating the whole process from start to finish to double-check myself).

Addendum: To quell doubts about possible differences caused by Cool Edit's normalization routine, I performed the listening test yet a third time, this time using the normalization routine within Exact Audio Copy. Again, I scored 15 of 16 on an ABX test.

 

Conclusion

Clipping distortion incurred by the encoding/decoding of near-full-scale files is audible. There are several ways to avoid/reduce/eliminate the problem:

1. Reduce the gain of the original WAV file before encoding.

a. The best way to do this is in the encoder, where it is possible to use floating-point values to represent the audio information instead of a 16-bit quantized stream. Example: the --scale switch in Lame. The tradeoff is that one should not reduce the gain too much. In Lame, at least, encoding quality starts to degrade as gain is reduced.

b. Less optimum (in theory) is to use an audio editor such as Cool Edit to reduce the gain. Cool Edit can work in 32 bits resolution to reduce the gain and dither back down to 16 bits for the final result. I say in theory, because the difference between using or not using dithering at 16-bit resolution is arguably audible.

c. Least optimum (again, in theory) is to reduce the gain without dithering, such as the normalizing routine within Exact Audio Copy or AudioCatalyst.

2. Reduce the gain of the encoded file before decoding or playing back. The advantage of this method is that there are utilities available which can reversibly adjust the gain of an encoded file (at least such utilities exist for the mp3 format) without ever having to decode to WAV first. If David Robinson's Proposed Standard for Replay Level is adopted by players, this would be the ideal solution.

3. Dynamically adjust the gain down if an instance of clipping occurs during decoding. This is what MAD's auto attenuation feature does.

 

Return to ff123's home page