Audioskeptic's blog: 2012

Sunday, April 15, 2012

Those of you in Europe may wish to check out this page: http://www.signalprocessingsociety.org/community/lectures/upcoming-lectures/ I won't be updating for a few days as you can tell from the schedule above.

Thursday, April 12, 2012

Loudness and Intensity - What's the difference?

What’s the Big Deal about Loudness and Intensity?

Language exists to communicate. If we don’t use the same meanings to the words, leaving out philosophical issues for now, communication is difficult; hence, a few definitions and words to help us in the discussions.

Many people confuse loudness and intensity and assume they are the same or similar enough that they can be treated interchangeably. This isn’t true. They are different and understanding why they are different and how to use them is one of the keys to understanding what we hear. This understanding will also help explain some of the reasons why different level compression methods sound so different.

What is Intensity?

Definition: Intensity is the external measured level of a sound.*

There are many measures of intensity. For now, we will use the common psychoacoustic (not mechanical engineering) definition of sound pressure level (SPL). SPL, usually measured in dB, is a measure of acoustic energy in the atmosphere and is defined as 20*log10 (actual pressure/reference pressure). There are assumptions built into SPL. Let’s keep it simple for now and ignore them. Note that there are several reference pressures used for SPL in different fields, we will use the one commonly used in psychoacoustics, not the ones used in mechanical engineering or sonar.

dB in electronics has a different kind of reference than dB in acoustics and psychoacoustics. In electronics, we do not know the meaning of 0dB in acoustic terms because we do not know the efficiency and frequency response of the rest of the system. (Electronics reminder: Remember, dB is always relative, so in electronics we must assign an arbitrary 0dB. Additionally, we don’t know, in general, how that corresponds to actual intensity in the atmosphere; we all have “volume” controls on our equipment. Speaking of which “volume” is neither horse nor mule, but rather a poorly defined attempt at one, the other, or both, depending on who contoured the control’s taper.) Therefore, electronics uses dBm, dBv, and other defined reference levels to provide a way to understand what the intensity of the analog of a signal is. There are assumptions built into these level measures as well, constant impedance, for instance. Again, let us keep it simple for now.

dB in digital terms is also different, in that it typically uses 0dB for a reference maximum level rather than a low or minimum level, and that levels in dB are therefore always negative.

What is Loudness?

Definition: Loudness is the internal, subjective experience of how loud a signal is.

The term loudness dates back at least to Fletcher, if not beyond. Loudness is not intensity. Subjective experience, perceptual experience, and individual auditory periphery all affect perceived loudness. Loudness is not a measured level in the atmosphere.

Loudness and intensity can be mostly related by a complex calculation. Keep in mind, however, that every listener is a bit different, hearing injuries affect loudness in many ways, and intensity does not equal loudness. The relationship is complex, and could constitute an entire tutorial in and of itself.

In the worst cases, intensity is a very poor substitute for loudness, and vice versa.

What is Intensity For?

Intensity is an objective measure. It measures the actual fluctuations in air pressure in the atmosphere that constitute sound. Use intensity when you need to know the actual fluctuations in air pressure. For example, when you want to know how much noise your car makes, when you need to know how to achieve a certain sound pressure level in a concert hall, or how much force a sonic boom exerts.

Do not use intensity when you want to know how loud a sound is to a listener.

What is Loudness For?

Loudness is a perceptual, or sensation concept. It describes the experience the listener has. Use loudness when you are trying to estimate psychoacoustic parameters, want to know why somebody is shouting “turn that **** thing down!” and the intensity isn’t that high, want to know why somebody is shouting “turn up the sound” when the intensity is already excessive, or want to match loudness, not levels, across audio selections, either full-bandwidth or from remotes/phones.

An Occasionally Useful Approximation, the Power Law Relationship

What can we say about the relationship between loudness and intensity?

When the spectrum (frequency content) of a signal is unchanged, loudness is approximately proportional to the 1/3.5 power of the signal power, or the 1/1.75 power of amplitude. Let’s call this the power law relationship. For frequencies above the audible thresholds, this approximation works well. For certain special signals or signals that have a lot of energy just below or at the threshold of hearing this approximation does not work as well.

As discussed in Why We Hear What We Hear, Part 3, the cochlea has a mechanical filter that does frequency analysis. ERB’s are the bandwidths of these filters. This mechanical filter is a set of continuously distributed, heavily overlapping filters, not one set of 30-ish adjacent filters. Some points to keep in mind:

ERBs are a way to understand frequency in perceptual terms, and constitute the best current measure of the filter bandwidths in the cochlea.
ERB bandwidths are approximately 70Hz at low frequencies, and about ¼ octave at higher frequencies.
Signals in the same band convert intensity to loudness with the power law relationship. The loudness in a given band is referred to as the partial loudness corresponding to that frequency.
The loudness of a signal is the sum of the partial loudnesses.

This explains why two signals at frequencies reasonably removed from each other, each of which has ½ the energy of the original, will have 2 * (1/2)^(1/3.5) or 1.64 of the loudness of one of the signals presented at energy 1. Doubling the energy of a signal without changing the spectrum will increase the loudness by a factor of about 1.21. In contrast, if we double the energy of the signal by adding as much energy in at a frequency where energy was not initially present on the cochlea (i.e. well outside of an ERB), the loudness doubles. Doubling the loudness is roughly equal to increasing the intensity by 10dB without spectral modification.

If we take the same amount of energy, and spread it out from a tone into 2, 3, 4, or more ERBs, the loudness will increase as the energy is spread out, as long as all of the signal spectrum remains above the absolute threshold of hearing.

Understanding loudness behavior in regard to bandwidth is critical. Here are some examples as a reminder.

If we double the energy of a sine wave, its loudness rises by approximately 2^(1/3.5).
If we make a new signal by adding a second sine wave with the same energy to the first example at a frequency removed from the first sine wave (in terms of ERBs), the loudness doubles.
The ratio of loudness for these two signals, which have the same intensity, is 1.21 vs. 2.

Here’s a graphical example. In this example, the vertical axis is the relative loudness, with the single-band loudness set to 1 for simplicity. The curve shows the relative loudness when the same amount of energy is split over n bands, from 1 to 25. The numbers for over 15 bands are probably an overestimate, but that’s signal dependent.

As we can see in this graph, when the bandwidth of a signal grows, even though the energy of the signal remains the same, the loudness will grow. When the number of bands involved becomes large enough that adjacent bands become involved (as it must above 14 or 15 bands),the data in this plot is somewhat of an overstatement. In general, when energy is spread into an adjacent band, the effect on loudness will be somewhat less than when energy is added to a band distant in frequency from the original signal.

Two other things to observe about this example, first the range of loudness corresponding to a given signal energy can vary by about a factor of 10, corresponding to an increase in energy (for an unchanged spectrum) of a factor of about 3000. Clearly, increasing bandwidth is a powerful method for increasing loudness without increasing signal energy. Second, the graph is only an approximation. It is approximate because

Effects due to different listeners will change it
Effects regarding absolute threshold will change it
The power law is merely approximate in the first place
The distribution of energy (adjacent bands or far away) will affect the outcome

None the less, the point holds that loudness is not always very strongly correlated to intensity.

In summary, dB SPL is a measure of the intensity of a signal. Loudness can be discussed in dB equivalent or related terms, but in general, dB is not a measure of loudness, and should not be presented as such. What dB equivalent means is that the loudness of a signal is equal to the loudness of some other, unchanging reference signal at the specified SPL for the reference signal.

dB does not measure loudness, but there are some commonly cited approximations, such as that doubling loudness without changing spectrum requires a gain increase of 10dB or so, as established by a set of experiments done by Fletcher, after Bell’s original experiments. Fletcher’s work has since been confirmed over and over by many other experimenters. The design of Fletcher’s experiments, Stevens’ experiments, and their successors are complex and interesting but beyond the scope of this article.

There is a Loudness Button on my Machine

There is a “Loudness” button on many receivers, amplifiers, and other audio equipment. Because of the well-known reduced sensitivity of the ear at low frequencies, some manufacturers choose to add a bass boost, sometimes variable from the front panel, and call it a “loudness” control. This bass boost is linear and time-invariant. The best we can say about Loudness controls is that they make the sound louder. They cannot fully compensate for changes in sensation level without being signal dependent and time varying.

Unfortunately, we can boost something with a fixed curve only if the signal has a fixed level and a fixed spectrum, and music is neither. Processing for loudness restoration is not linear time invariant (time invariant frequency shaping), it’s not even close to being linear time invariant.

Take a look at this graph of loudness level contours versus intensity levels. The crowding of signals at low frequencies shows the effects of loudness growth at low frequencies that does not scale to dB. A real loudness restorer that worked for different intensities of presentation levels would have to be time-varying and signal dependent. It’s not that easy.

(Image from http://en.wikipedia.org/wiki/File:Lindos1.svg)

Common Questions and Moving Forward

Be careful to use the terms loudness and intensity as distinct terms and to bear in mind how loosely the two are related. When we use nonlinear processing such as level compression that spreads frequency content, we must realize that the loudness may rise faster than expected.

Does that mean that there is such a thing as loudness “enhancement”? Well, leaving aside the metaphysical question of exactly what an enhancement is, yes. If a nonlinearity spreads the spectrum of a signal, it is likely to become louder. Some examples of loudness “enhancement” would include

LP distortion grows with level. That means that as level grows, the signal bandwidth (including the distortion) increases. An increase in intensity is over-represented by the increase in loudness.

This can create an illusion of “more dynamic range”.
It can also be very annoying.

Tape distortion grows with level. It behaves different than LP’s, but to the same result, at usual saturation levels.

What about “Make it LOUD” sorts of processing, such as used in certain kinds of level compression processing for radio broadcast or some overly loud CDs? Oh, yes, they work. They certainly do “make it louder”, indeed.

It is a bit of an art to make a signal have a peak to RMS ratio similar to that of a sine wave.
They spread the spectrum very broadly.
You can create your own opinion about how they sound.

* You will notice that I say that I am treating intensity informally above. This is deliberate, intensity does have a formal definition in acostics, and a somewhat different meaning in psychoacoustics, at least in some quarters. In the case of hearing, the eardrum responds to pressure, and the head interacts with the volume velocities of the air around it to convert some amount of volume velocity to pressure at higher frequencies. This being how head related transfer functions (HRTF's) come into being, but that's another article for another day. More specifically, in this article, intensity refers to the actual differential pressure across the ear drum, that being what the head, pinna, etc, convert an acoustic sound field into.

Friday, March 23, 2012

Why We Hear What We Hear, Part 4 (end)

What This Means about What You Hear

The plasticity of forming auditory features and auditory objects has some very strong implications. If you listen for different things, you will hear different things. This is not illusion, it is not confusion, it is not deception, it is not hallucination, it is simply how your brain works. This has a particularly important implication for audio enthusiasts, which is expectation will always cause you to hear things differently. The effect of expectation is not always positive, or negative, for example you may not hear what you expect to hear, but expectation can not, not now, not ever, be consciously filtered out of your hearing experience.

Expectation is why audio testing needs to be what’s called “double blind”. Double blind does not mean that you close your eyes, wear a blindfold, or have anything to do with vision. Blind means that when you are trying to identify something you do not know which of several (usually two) signals it is, even though you can, or should be able to, switch to each of the known signals at will for reference and comparison. Double means that any experimenter you interact with in any fashion also doesn’t know which you’re hearing. Lots of work has shown that cues from an experimenter will lead you to answer differently. Interestingly, you may not agree with the experimenter in a single-blind test, you may take the other answer than the one the experimenter prefers, but your performance will be affected.

This can be demonstrated by the old “backward masking” demonstration that was originally brought up in a different context, where it was alleged that there was satanistic content to part of the song “Stairway to Heaven” when it was played in reverse. Interestingly enough, when the reversed song is played to an unsuspecting audience (this has happened quite a few times in lectures), the audience hears nothing, or a very few random syllables. When, however, the “words” are presented, the audience hears the words clearly and correctly, even though the actual sound presented to the ear is not changed. This is a direct result of how we understand speech, when expecting speech with some particular content, we guide our feature extraction and object resolution to find the parts of our audio surroundings in order to extract the information, and in this case can do so even when there is no such information. This kind of effect is echoed in “EVP” arguments, wherein sounds, whispers, etc., are supposedly heard in recordings of radio static or other random noise.

In summary, the processing done in the brain is exceptionally plastic, and can be guided by a variety of things. The result of all that processing is what we actually, consciously hear.

If you listen to something differently (for different features or objects)

You will remember different things
This is not an illusion

If you have reason to assume things may be different

You will most likely listen differently
Therefore, you will remember different things

In short, if you want to be sure of what you heard due to only the auditory stimuli, you need to arrange a blind test of an appropriate nature. Such tests are neither easy nor simple to arrange, but are the best, and perhaps only way to avoid inadvertent self-deception.

Why We Hear What We Hear, Part 3

Four Key Points

Here are four key points about the auditory periphery.

The auditory periphery analyzes all signals in a time/frequency tiling called ERBs or Barks.
Due to the mechanics of the cochlea, first arrivals have very strong, seemingly disproportionate influence on what you actually hear. This is actually useful in the real world.
Signals inside an ERB mutually compress.
Signals outside an ERB do not mutually compress.

The first arrival information in a given ERB, where the first arrival is the first signal in approximately the last 200 milliseconds, is emphasized by the mechanics of the cochlea. The compression of signals inside an ERB starts to take effect about 1 millisecond after the arrival of the sound, so the first part of a sound sends more information to the brain. This turns out to be very useful to us in distinguishing both perceived direction and diffuse sensation.

The partial loudnesses from the cochlea are integrated somewhere at the very edge of the CNS such that some memory of the past is maintained for up to 200 milliseconds. Level Roving Experiments show that when delays approaching 200 milliseconds exist between two sources, the ability to discern fine differences in loudness or timbre is reduced. It is well established you need very quick, click-less switching between signals when trying to detect very small differences between signals, otherwise you lose part of your ability to distinguish loudness differences.

Final Steps

There are two steps remaining in what you hear, both of them executed by the brain in a fashion I do not care to even guess about. The first step is analysis of the somewhat-integrated partial loudnesses into what I call “auditory features”. There is a great deal of data loss at this juncture, about 1/1000^th of the information present at the auditory nerve remains after feature reduction, the rest being integrated or discarded into the information that remains. At this level, feature analysis is extremely plastic and can be guided by learning, experience, cognition, reflex, visual stimuli, state of mind, comfort, and all other factors. The features last a few seconds in short-term memory.

In the second step, the information from feature analysis is again reduced in size by about 100 to 1000 times, and turned into what I refer to as “auditory objects”. These are things that one can consciously describe, control, interpret, etc. Words are examples of things made of successive auditory objects. This process is as plastic as plastic can be, you can redirect yourself cognitively, be directed by visual stimuli, guesses, unconscious stimuli, including randomness, and is the final step before auditory input can be converted to long-term memory. Interestingly, this process can promote a feature to an object if you consciously focus on that feature. It is this process that is most affected by every other stimulus, including those generated internally. It is at these last two points, where short-term loudness is reduced to features and then objects, where we integrate the results from our senses and our knowledge, regardless of our intent or situation.

Thursday, March 22, 2012

Why We Hear What We Hear, Part 2

An Introduction to the Auditory System

The auditory system consists, in the way it is usually examined, of two parts, the periphery (head, ear, cochlea), and the brain, or Central Nervous system (CNS). The flow of information is substantially one-sided, with the ear providing a lot of information to the brain, and the brain providing very little feedback (some loudness issues, body and head movement) to the periphery, relatively speaking.

What is the Periphery?

It is all of the hearing apparatus that is outside of the brain. I am not, for this purpose, including body and skin sensation, which operate mostly at higher sounds levels than we should be listening to. I’ll break this into three distinct sections

HRTF’s, including ear canal (outer and middle ear functions)
Cochlear analysis (inner ear)
Reduction of sound into partial loudnesses as a function of time (inner ear)

First, the physical shape of the head, ear, body, and surrounding environment create what’s called a Head Related Transfer Function (HRTF) for each ear. You can think of the HRTF as a frequency response that varies with distance and angle between the sound source and the head, providing an Interaural Level Difference (ILD) between each ear. The HRTFs also create the Interaural Time Delay (ITD) information gathered by the physical acoustics. The ITD and ILD together are the information gathered by physical acoustics for directional processing by the CNS, as I will explain later.

All the effects of the outer and middle ear are being simplified and lumped into the HRTF for this tutorial. Conventionally in the literature, the functions of the outer and middle ear are separated from each other.

What does the middle ear contribute to the HRTF? The ear canal influences the HRTF by its resonance, which is a function of its length and width. The eardrum and 3 bones of the middle ear influence the HRTF by changing the impedance of the system and rejecting “near DC” components of sound. In other words, it’s similar to a transformer, modifying the relationship between force and distance as well as filtering out very low frequencies like changes in barometric pressure. The middle ear’s rejection of near-DC signals is absolutely essential; otherwise a passing weather front would amount to a 150dB sound level, intolerably loud and damaging to our hearing.

Second, the cochlea filters the time signal into many overlapping bands. The cochlea is a complicated organ that performs a mechanical filtering of the signal entering the ear. It filters the incoming signal into heavily overlapping critical bands or Equivalent Rectangular Bandwidths (ERBs). This filtering gives us our frequency sensitivity above about 100Hz.

Third, the signal in each ERB is compressed to give us a reduction of sound into partial loudnesses as a function of time. This ties changes in atmospheric pressure to sensation. A Sound Pressure Level (SPL) is a physical quantity, a measured change in physical atmospheric pressure. A partial loudness is a measure of the sensation level detected by an inner hair cell, a perceptual quantity. An intensity increase of 10dB creates a change in sensation level of about a factor of 2. A signal on one ERB does not compress the signals that do not enter into the same filter, so we are left with an interesting effect: signals are compressed inside an ERB, but partial loudnesses from 2 ERBs add! The sum of all the partial loudnesses is in fact what we think of as loudness.

Please notice that I’ve used two words in a very specific way. To be clear,

loudness is the level you hear, the sensation level that gets shipped to your CNS
intensity is the measured signal level in the atmosphere, the SPL

The two do not track very well, even with knowledge of the compression mentioned above. The only time they track each other is when the frequency response of the two signals being compared have the same shape and only the total energy of the two signals are different. A measure of SPL is not enough to determine loudness by itself.

For further information about loudness, see the loudness tutorial of April 2006 at the Pacific Northwest AES web site (Loudness Tutorial PowerPoint presentations without sound, with audio). There are a variety of slide decks there on the subjects of audio and hearing, which you may find interesting.

What is the Central Nervous System?

The central nervous system (CNS) consists of the end of the auditory nerve where it connects to the brain and then the brain. It carries out, in some fashion that we do not well understand, the following operations:

Reduction from partial loudness to auditory features
Reduction of auditory features to auditory objects
Storage in short-term and long-term memory

There are some well-known issues with the CNS. In particular, it’s very flexible in the way it interprets input. This is sometimes referred to as “plastic”. What plastic means is you can change what you listen to, what you look at, what you smell or feel, and the CNS, by design, combines information from all sensory modalities. It does this all of the time under all circumstances. Everywhere. All the time.

Plasticity, by itself, creates a problem with isolating what’s going on in any one bit of audio, be it due to equipment, performance, or what-have-you, because the mere knowledge of which of a set of things you are listening to will change the way you think about it, and hence what you pay attention to. This is not a question of “Heisenberg” kinds of uncertainty, rather it is a simple case you notice the things you focus on.

What information gets to the CNS and when does it get there? Anything detected by the auditory periphery, the visual periphery, knowledge of what button you pushed on the amplifier, the color of the speaker grill cloth, and so on gets to the CNS. However, what is specifically important for auditory sensation is the information from the auditory periphery.

Here, we are not going to consider body sensation (although it is certainly germane at low frequencies at high levels, or at ultrasonic frequencies at extreme levels). We will also leave out extremely intense LF and VHF signals, which can be detected by other means; these are extreme conditions and should not generally be experienced by a listener.

To summarize, the CNS and its connection to the audio periphery can be illustrated as

Why We Hear What We Hear, Part 1

This overview explains why we hear what we hear and how you can manipulate it to get different results from multiple listenings.

Background and Notes about This Tutorial

This tutorial contains information garnered from 26 years as a research scientist at Bell Labs in Acoustic Research at Murray Hill, and its lineal descendants, from a variety of papers, from learning while I was an Audio Architect at Microsoft, and when I was Chief Scientist for Neural Audio. It contains ideas gathered from a variety of papers and experiments, done by many people, over a long period of time.

The information presented here is a work in progress, as research in both the hearing periphery, by which I mean the ear, up to and including the cochlea, and the Central Nervous System (CNS), that is the brain, continues today.

I will present a high-level overview of a number of very broad set of subjects. These are phenomena that are observed by researchers in both the hearing and cognitive psychology communities. Each of these phenomena warrants in depth study, some of which will be covered by future tutorials.

Keep in mind, this information

is not inviolate
is a discussion of phenomena
is an unknown mechanism, in most cases, once one gets beyond the basilar membrane
will be revised by further research as time goes on, there will always be revisions