Binaural sound source localization - Basics

binaural, adj.:
of, relating to, or involving two or both ears

With binaural sound source localization it is possible to identify the spatial position of sound sources using two acoustic sensors (ears for biological systems; microphones for technical systems). There are two major classes of cues that are used for sound localization by animals and humans:

1. Monaural cues
For monaural cues, only one ear (or microphone) is needed. The head and ear shape as well as the torso act as direction-dependent frequency filters for the sound wave arriving at the eardrum. This so-called head-related transfer function (HRTF) varies between individuals.

2. Binaural cues
For binaural cues, the information of two ears (microphones) is needed. They fall into two categories:

• Interaural time differences (ITDs)
Interaural time differences are caused by the different propagation times a the sound wave from the source to both ears. For e.g. a source to the left, the sound wave will reach the left ear slightly before it reaches the right ear.

• Interaural level differences (ILDs)
Interaural level differences are caused by the acoustic "shadow" of the head. For e.g. a source to the left, the sound wave will arrive at the left ear slightly louder than at the right ear.

Of all the acoustical cues available for sound source localization, interaural time differences are by far the easiest to use in a technical implementation. The only parameter that has to be known is the distance between the microphones (as long as there is no obstruction between them).

Monaural cues and interaural level differences, on the other hand, require some sort of artificial "head" between the microphones. This setup then has to be calibrated, i.e., the HRTFs and ILDs generated by a specific setup have to be measured, which is a rather tedious and elaborate process. Furthermore, any change in the artificial "head" or the microphones entails a recalibration.

As the robots featured on this site all use ITD-based sound source localization, I will not go into further detail on interaural level differences based localization.

Far-field assumption

Far-field assumption

Although sound waves emanating from a (point) source propagate spherically, for practical purposes it can be assumed that the sound wave is planar. It can be shown that for d/b>2.7, the error becomes smaller than 0.5° (compared to spherical propagation).[1]

Under the far-field assumption the relationship between interaural time difference and the azimuthal angle to the sound source becomes simply:

\alpha=\arcsin{\frac{\Delta t\cdot c}{b}}

where \alpha is the azimuth to the sound source, \Delta t is the interaural time difference (in seconds), c is the speed of sound (in m/s) and b is the distance between the microphones.

Neuronal model (Jeffress model)

So, how can the brain extract interaural time differences, which - for humans - lie in the range of 625 µs (for an ear distance of 21.5 cm)?

L. A. Jeffres[2] proposed a hypothetical model of how neurons in the brain could make use of these tiny time differences, illustrated in the figure below:

The Jeffress model.

It consists of two major features: axonal delay lines and coincidence detector neurons. The coincidence detector neurons will (simply put) only fire if both of their inputs are excited simultaneously. Due to the finite running times of action potentials, the axons leading to the coincidence detectors act as delays.

If, for example, a sound source is positioned to the left of the head, the action potentials from the left auditory nerve will have time to travel further along the upper (in the picture) axon, before the action potentials from the right ear arrive. Somewhere on the right side of the structure, they will simultaneously excite a coincidence detector neuron, which will then fire. In this way, the interaural time difference has been converted to a location in a neuronal structure.

References

  1. Calmes, L. (2009). Biologically Inspired Binaural Sound Source Localization and Tracking for Mobile Robots. PhD thesis, RWTH Aachen University. [PDF]
  2. Jeffress, L. A. (1948). A place theory of sound localization. Journal of Comparative & Physiological Psychology., 41(1):35-39
Home