Thursday, March 22, 2012

Why We Hear What We Hear, Part 2

An Introduction to the Auditory System

The auditory system consists, in the way it is usually examined, of two parts, the periphery (head, ear, cochlea), and the brain, or Central Nervous system (CNS). The flow of information is substantially one-sided, with the ear providing a lot of information to the brain, and the brain providing very little feedback (some loudness issues, body and head movement) to the periphery, relatively speaking.

What is the Periphery?

It is all of the hearing apparatus that is outside of the brain. I am not, for this purpose, including body and skin sensation, which operate mostly at higher sounds levels than we should be listening to. I’ll break this into three distinct sections

  • HRTF’s, including ear canal (outer and middle ear functions)
  • Cochlear analysis (inner ear)
  • Reduction of sound into partial loudnesses as a function of time (inner ear)

First, the physical shape of the head, ear, body, and surrounding environment create what’s called a Head Related Transfer Function (HRTF) for each ear. You can think of the HRTF as a frequency response that varies with distance and angle between the sound source and the head, providing an Interaural Level Difference (ILD) between each ear. The HRTFs also create the Interaural Time Delay (ITD) information gathered by the physical acoustics. The ITD and ILD together are the information gathered by physical acoustics for directional processing by the CNS, as I will explain later.

All the effects of the outer and middle ear are being simplified and lumped into the HRTF for this tutorial. Conventionally in the literature, the functions of the outer and middle ear are separated from each other.

What does the middle ear contribute to the HRTF? The ear canal influences the HRTF by its resonance, which is a function of its length and width. The eardrum and 3 bones of the middle ear influence the HRTF by changing the impedance of the system and rejecting “near DC” components of sound. In other words, it’s similar to a transformer, modifying the relationship between force and distance as well as filtering out very low frequencies like changes in barometric pressure. The middle ear’s rejection of near-DC signals is absolutely essential; otherwise a passing weather front would amount to a 150dB sound level, intolerably loud and damaging to our hearing.

Second, the cochlea filters the time signal into many overlapping bands. The cochlea is a complicated organ that performs a mechanical filtering of the signal entering the ear. It filters the incoming signal into heavily overlapping critical bands or Equivalent Rectangular Bandwidths (ERBs). This filtering gives us our frequency sensitivity above about 100Hz.

Third, the signal in each ERB is compressed to give us a reduction of sound into partial loudnesses as a function of time. This ties changes in atmospheric pressure to sensation. A Sound Pressure Level (SPL) is a physical quantity, a measured change in physical atmospheric pressure. A partial loudness is a measure of the sensation level detected by an inner hair cell, a perceptual quantity. An intensity increase of 10dB creates a change in sensation level of about a factor of 2. A signal on one ERB does not compress the signals that do not enter into the same filter, so we are left with an interesting effect: signals are compressed inside an ERB, but partial loudnesses from 2 ERBs add! The sum of all the partial loudnesses is in fact what we think of as loudness.

Please notice that I’ve used two words in a very specific way. To be clear,

  • loudness is the level you hear, the sensation level that gets shipped to your CNS
  • intensity is the measured signal level in the atmosphere, the SPL

The two do not track very well, even with knowledge of the compression mentioned above. The only time they track each other is when the frequency response of the two signals being compared have the same shape and only the total energy of the two signals are different. A measure of SPL is not enough to determine loudness by itself.

For further information about loudness, see the loudness tutorial of April 2006 at the Pacific Northwest AES web site (Loudness Tutorial PowerPoint presentations without sound, with audio). There are a variety of slide decks there on the subjects of audio and hearing, which you may find interesting.

What is the Central Nervous System?

The central nervous system (CNS) consists of the end of the auditory nerve where it connects to the brain and then the brain. It carries out, in some fashion that we do not well understand, the following operations:

  • Reduction from partial loudness to auditory features
  • Reduction of auditory features to auditory objects
  • Storage in short-term and long-term memory

There are some well-known issues with the CNS. In particular, it’s very flexible in the way it interprets input. This is sometimes referred to as “plastic”. What plastic means is you can change what you listen to, what you look at, what you smell or feel, and the CNS, by design, combines information from all sensory modalities. It does this all of the time under all circumstances. Everywhere. All the time.

Plasticity, by itself, creates a problem with isolating what’s going on in any one bit of audio, be it due to equipment, performance, or what-have-you, because the mere knowledge of which of a set of things you are listening to will change the way you think about it, and hence what you pay attention to. This is not a question of “Heisenberg” kinds of uncertainty, rather it is a simple case you notice the things you focus on.

What information gets to the CNS and when does it get there? Anything detected by the auditory periphery, the visual periphery, knowledge of what button you pushed on the amplifier, the color of the speaker grill cloth, and so on gets to the CNS. However, what is specifically important for auditory sensation is the information from the auditory periphery.

Here, we are not going to consider body sensation (although it is certainly germane at low frequencies at high levels, or at ultrasonic frequencies at extreme levels). We will also leave out extremely intense LF and VHF signals, which can be detected by other means; these are extreme conditions and should not generally be experienced by a listener.

To summarize, the CNS and its connection to the audio periphery can be illustrated as

No comments:

Post a Comment