Test For "Ringing"

Notes by ff123
Listening tests by many

 

NOTICE: The listening tests performed with Lame 3.88 are for an alpha version, and are not necessarily representative of the Lame 3.88 beta version.

Addendum, Mar 30, 2001: Added results of EarGuy's Digital Ear

I prepared a short test for high-frequency "ringing" in several mp3 encoders, using a male vocal sample that a Czech member of the Lame mail list made a year or so ago. This shortened version of the sample was forwarded to me by Hans Heijden. I had asked Hans if he knew of another example of the type of artifact heard by bAdDuDeX, which was particularly objectionable to him in music encoded by Lame (see my Ringing in Lame page). The term "ringing" was used by bAdDuDeX to describe a certain type of artifact which DualIP ascribes to on-off switching of higher subbands (see text below).

Here were my instructions for the listening test, which I posted to alt.binaries.sounds.mp3.d and to the r3mix.net forum:

There are four mp3 files encoded at 128 kbit/s. The default low-pass filters were enabled. The encoders are:

1. FastEnc (Cool Edit Pro with MP3 ME plugin, CRC disabled)
2. MP3Enc31 (-qual 9)
3. Lame 3.87RH (-h)
4. Lame 3.88CVS 010104 (-h --nspsytune)

The encoded mp3 files can be found on my Audio Samples Page as eb_andul.zip. The original file is eb_andul_short.flac, and has been compressed using the lossless compressor, FLAC. Please rank the samples and tell me what you think sounds best, in order.

Here is a quick summary ranking by listener:

Listener Ranking (best to worst)
HansHeijden 45, 33, 24, 16
bAdDuDeX 45, 33 = 24, 16
r3mix 45, 24, 16, 33
DualIP 45=33=16, 24
Naoki Shibata 33 = 45, 24, 16
JohnV 33, 45, 16, 24
2Bdecided 33, 45 = 24 = 16
ff123 33, 16, 24, 45
TLO 24, 16, 33, 45
JuliusBT 16, 33, 24, 45
Speek no difference
Digital Ear left ch: 33, 16, 24, 45
right ch: 16, 33 24, 45

The detailed responses follow:

r3mix:

without having listened to the original, so judging on what sounds best to me:

short16: I don't like this one. Has a sharp tone at the 1.6 second mark.
short24: Has a less obvious sharp tone at the 2.1 second mark.
short33: IMO has an overall lesser quality than the other ones. Harsher sound and a bit more distortion. No real glitches as in 16 and 24.
short45: best quality IMO. sounds less distorted and more uniform to listen to. (hard to explain) - i can enjoy this one without an arising suspicion it's been mpegged. (from what I experience as such after listening >2 years to mp3's)

now I listen to the original:
I have a strange sensation at the 2.25s peak. Little noise there.
ah, I also found a glitch in the short45: at exactly the 1.0s mark in mainly the R channel there is a low-freq glitch.
so, none perfect but if you force me to choose It'd be 45 because it sounds most enjoyable to me.

ok, I must stop now because I'm starting to hear things that aren't there...

45,24,16,33 is the order I'd pick if I needed to listen to the whole album. (taking it's a like those few seconds)

2Bdecided:

16 - picked out from original in blind test - sibilance has slightly wider stereo spread
24 - ditto - sibilance sounds like it's phasing
33 - ditto - sibilance has very slightly wider stereo spread - better than 16 though
45 - ditto - first sibilance has a quiet "blip" behind it.

16 has that classic mp3 128 sound (watery). I like 24 because it least it's different - doesn't automatically make me think "mp3" - more "Lucy in the Sky with diamonds" (slightly phasey). 33 has a bit of that classic mp3 128 sound, but I think it's probably my favourite. 45 would be on a par with 16 or 33 if it weren't for that little blip, which spoils it.

I don't really like any of them - can't I say the original is my favourite? OK - I'll take 33.

You can't listen too closely to a short sample, because after a while you've heard it so often that you hear things (like Roel said) - however, a whole album of 16 would be obviously mp3, even if no one told you, and you didn't have the original.

btw, the original sounds like it's been overly processed with noise reduction - can't tell if it's Sonic Solution, Cedar, or a cassette played on Dolby C though ;-)

my ranking would be: 33, 45, 24, 16 BUT it's so close that it's really 33 1st, then the others all joint second.

I put 45 second assuming it wouldn't make any further loud blips. However, in reality I guess it would, and, over a whole album I think this would be so irritating as to put it last!

24 would get irritating once you got used to it I think - and 16 is irritating because I _have_ got used to that sound. So 33 is best, and the others are equally bad - sorry!

Naoki Shibata:

16 and 24 : "sh" voices are slightly watery.
45 : I can hear blip at the beginning of first word.

My rankng is :
33 = 45(blip is not considered) > 24 > 16

But the differences are very small.

[in response to bAdDuDeX's comment on background hiss]:

Please check if specifying "--athtype 1" improves the background hiss.

BTW, if you test CBR mode of latest CVS version of lame with --nspsytune, please use -q1 instead of -h.

JuliusBT:

my guess is that 24 is the MP3Enc version since it has that sound.

I could [...] distinguish the original wav from all of them, and I thought that 45 sounded least OK to me. I would still have rated them:

16/33/24 best
45 worst

My judgement is beclouded now, I'd rank 16 as best, [...] 33 [...] is second. Then 24 and at the bottom 45 (not even counting the glitch!)

I could not tell much of a difference between the 4 samples, apart from the 45 one that I thought sounded less perfect than the others.

I could not hear much real high frequency content at all (above 16 kc).

Speek:

I can't tell the difference. For me they are all the same. I don't hear any ringing or other artifacts.

HansHeijden (from email messages to me):

Ranking turns out to be simply: the higher the number in the filename, the better. This based on the amount of ringing heard. However concerning the first 's', 24 and 33 are less distorted. So either 33 or 45 I would prefer...
None of the 4 could confuse me with the original .wav.

[33 is] almost as 'ringing-free' as [45], and superior with the 's' distorsion.

DualIP (from a post on alt.sounds.binaries.mp3.d):

ff123 wrote:
>
> I asked Hans to provide me with a sample that had the same type of artifacts
> that he heard in the previous test sample. Since I don't know what that sounds
> like, I can't describe it.

I do hear it on the 33.mp3 . , and understand the term clearly. it's caused by the on/off switching of higher subbands. Encode some noise stuff to say 48kb/s at 22k sample freq , an this subband switching will be clear to anyone

ringing:
most noticable on 33 ,
On 24 I hear it too , but it's "hided" amongst other artefacts
On 16 it's totally hidden in artefacts

overall qual:
16-24 : Equal worse.
33 middle
45 Best.( No ringing , no artefacts , but I do hear loss of dynamics due to encoding at low bitrate: pumping)

*****************

After :
-reading others comments (TLO)
-listening to the samples over and over again
-paying real attention to the original .WAV
-paying less attention to my first time hearing of the ringing
I'd like to re-evaluate!

First of all the .WAV:
-I hear this pumping effect on the .WAV as well from 0:00 till 0:04. It seems like the voice is going through some noise gate , modulating the bandwidth. What I call pumping. Signals , modulating amplitude of other signals. This is common in compressor/expander systems like DolbyC. Also very apparent in VQF.

-The two slishing ssssh sound at 0:04 and 0:06 are very artificial. It sounds to me like these open up the noise gate for full-bandwidth and , even worse , starts some echo/reverb , that only works in this high frequency region. Very unnatural since normal echo/reverb works as a low pass filter, and here you only hear the high-freq. part of the sssh decaying

Previous I didn't realize these "flaws" in the original .wav. and this obscured previous judgement.

The ringing is in the echo/reverb of this sssh sound. Seems like worst encodes make some staircase decay in stead of exponential decay out of it. (subbands on/off switching)

Because 45 sounds dull , it gets rid of some of the HF flange-alike "ssssh echo", improving sound quality at first sight.

New judgement:
24 Without a doubt worst ,most artefacts.

33 , 45 , 16 , 99 come close , have different flaws making comparison hard. In any order

16: Wider bandwidth at the expense of most obvious ringing:
33: Less ringing , but I do hear other common artefacts
45: Artefact free but low amplitude HF gets filtered away (=dull), speeding up ssssh decay
99 Somewhere in between 16-33-45 , good compromise

TLO (from a post on alt.sounds.binaries.mp3.d):

A deceptively simple clip that presents clear challenges once scrutinized. Clicks suggest a vinyl source (or CD mastered from LP, do you know if that's true?). Wide separation of acoustic piano in one channel, acoustic guitar in the other, and reverb on the sibilant vocal in both promise a good test of 128.

Listening carefully I hear details I'd guess are going to cause some problems: At some points, the guitarist's hand on the wound strings creates a clear scraping sound in the right channel, the first at appx .7 secs (string muting almost like a breath intake) and a pair of subtler ones around 4.7 and 5.3 for example.

The vocal effect isn't going to be encoder friendly either -- sounds like a good analog plate reverb. The piano is dry and compressed so it shouldn't be an issue (as it can often be at stereo and fuller frequency). Traditional "70s big studio" Cat Stevens production aesthetic here.

Ranked and superlatively speaking:

01. Eb_andul_short16.mp3 - Overbright, some artifacts. Second
02. Eb_andul_short24.mp3 - Good brightness, similar where not subtler artifacts. Best sounding. First
03. Eb_andul_short33.mp3 - Softer artifacts possibly but upper midrange tonality may be diffused. Third
04. eb_andul_short45.mp3 - Dullest sounding yet most artifacts, the sound at .7 becomes a doubled blurp. Bad. Dead last

Given similarities in the first 2 and maybe 3 I'm going to go out on a limb and ID the encoders. These are:

Eb_andul_short16.mp3 = LAME
Eb_andul_short24.mp3 = MP3 Me variant of FastEnc
Eb_andul_short33.mp3 = LAME with -nspsytune
eb_andul_short45.mp3 = MP3Enc

All have artifacts better described as a watery chirping rather than ringing. They may be softer on 33 but no more so than an EQ would obscure.

The dullness and blatant artifacts in 45 leave little doubt that is MP3Enc: I can hear that garbage from a mile away. 16 and 24 are quite close. 33 I might suspect as nspsytune as it's the odd man out in terms of a slight upper mid smearing or blurriness. Out of these, whatever made 24 is what I'd hope the track would be encoded with, but 16 is alright if I *had* to take a 128.

***

I hear upwards of 18 kHz. I hear the higher-pitched watery chirping being called "ringing". I hear artifacts. And in simple point of fact I hear the most artifacts in [45]. Flat out, more artifacts.

bAdDuDeX (from a post on alt.sounds.binaries.mp3.d):

16 - Vocals sound too bright, the hiss in the background is totally masked, and there's ringing everywhere.
24 - Somewhat more artifacts than 33 but hiss in the background isn't totally masked. Maybe even a little less masked than 45...
33 - Hiss is totally masked but second least amount of artifacts.
45 - The hiss isn't totally masked and it has the least amount of artifacts.

In case you don't know, the hiss in the background is a GOOD thing. It shows that the encoder assumes it to be audible and thus doesn't mask it out.

ff123 wrote:
> Is the background hiss present in the original? Loss of hiss is an
> interesting artifact, and it wasn't mentioned by the others, except
> for JuliusBT, who saw some extra noise on 24 and 45 using his
> difference signal graphs. He didn't comment on the effect it had on
> the sound, though.

Yes, it's in the original. Must have been introduced in the recording process. "Loss of hiss" isn't an artifact in itself, heh. That type of thing pertains to any quiet noises in the background on music, not just hiss. It shows that LAME masks more than the FhG encoders do. If I were to go just by artifact rate then I would have put 33 over 24 (instead of a tie). But you have to factor extra masking into the picture too.

Easily first place: 45
Second: Tie between 24 and 33
Easily third: 16

ff123 wrote:
> Naoki Shibata recommended the following [to reduce masking of hiss]:
>
> -q1 --nspsytune --athtype 1, using a recent CVS version. I have
> encoded such a file using CVS version dated 1-22-01. You can find it
> at my Yahoo briefcase as eb_andul_short99.mp3
>
> The effect of athtype 1 is to increase the frequency response during
> quiet portions (lowering the masking). The -q1 replaces -h in recent
> CVS versions. Hans Heijden hears an improvement since the Jan 4 CVS
> version, although he can't hear the background hiss. What do you hear?

Just did a quick test and now it has even more hiss than the MP3Enc version. It still has more artifacts though. 99 still doesn't have as much hiss as the WAV but it's closer than 45 now.

[...] when using --nspsytune LAME doesn't really have any ringing on this sample. The main thing is that the vocals don't sound as clean as 45. They sound....compressed or something. 45 has much more natural sounding vocals. 24 sounded even more compressed with the vocals (which is a big problem with FastEnc, I hear it all the time). 16 just sounded horrible all-around. That's definitly a file I would delete.

[On the term "compressed"]:
Well, "compressed" isn't really an artifact in itself. I just used that term because I couldn't think of any other way to describe what I was hearing. The vocals with the other encoders just sound far different than the WAV.

[responding to TLO's comment on what "ringing" sounds like]:
The ringing I hear doesn't sound ANYTHING like watery chirping. You must be thinking of a different artifact.

JohnV:

Ok, Iīll give my view, although I already know which sample is which, so I consider myself already biased.

In my opinion 24 is the worst. This is the sample that in my opinion has the most quite audible artifacts, especially with īsī vocals. But also the quality in whole is the lowest in my opinion.

16 is the second worst. I donīt like the īsī vocals and I hear some high frequency vibrations especially with īsī echoes.

45 it sounds somewhat dull and the blip in the beginning before that "spischnich"-word ruins it. It sounds somewhat "thicker" or a duller than the original or other samples.

33 In my opinion this is overall the most solid quality sample among these, surely not anywhere near transparent though. For example the īsī just before 4 second position is not very sharp. Not good but uniform quality is what makes this the best, in my opinion.

Huh, I canīt believe somebody actually thinks 24 sounds best. Itīs quite badly audibly distorted all the way. I would bet that if the sample was in english language, 24 and 16 would be clearly identified the worst samples.

I took me some time (5-6 times each) before I could really hear majority of distortions. The pronunciation is quite odd, and that at least in my case was a bit problematic.

ff123:

16: I noticed a spreading of the sibilance in the first "s" plus a very small noise around the same time (ABX = 16 of 16)
24: Quite noticeable blooming of the sibilance in the first "s," as if the out-of-phase content had been increased. (ABX = 16 of 16)
33: Harder for me to distinguish from original; sibilance in the first "s" is not as sharp (kind of sounds like pre-echo to me); perhaps a little distorted in the area of the first "s" (ABX = 14 of 16)
45: Obvious artifact in right channel near first "s." From the descriptions I had read, I thought it was going to be a "dropout." I rate this worst because the artifact is most noticeable to me out of all the artifacts I heard (didn't bother to ABX).

Ratings from best to worst: 33, 16, 24, 45

I had the d-ear listen to the Eb_andul clips for both the left and right channel. I placed the average distortion on a normalized line from left to right with the leftmost being the best and the rightmost being the worst (of the test set). Here are the results:

Which encoder is which? Answers are here.

Return to ff123's home page