With binaural sound source localization it is possible to identify the spatial position of sound sources using two acoustic sensors (ears for biological systems; microphones for technical systems). There are two major classes of cues that are used for sound localization by animals and humans:
Of all the acoustical cues available for sound source localization, interaural time differences are by far the easiest to use in a technical implementation. The only parameter that has to be known is the distance between the microphones (as long as there is no obstruction between them).
Monaural cues and interaural level differences, on the other hand, require some sort of artificial "head" between the microphones. This setup then has to be calibrated, i.e., the HRTFs and ILDs generated by a specific setup have to be measured, which is a rather tedious and elaborate process. Furthermore, any change in the artificial "head" or the microphones entails a recalibration.
As the robots featured on this site all use ITD-based sound source localization, I will not go into further detail on interaural level differences based localization.
Although sound waves emanating from a (point) source propagate spherically, for practical purposes it can be assumed that the sound wave is planar. It can be shown that for d/b>2.7, the error becomes smaller than 0.5° (compared to spherical propagation).[1]
Under the far-field assumption the relationship between interaural time difference and the azimuthal angle to the sound source becomes simply:
where \alpha is the azimuth to the sound source, \Delta t is the interaural time difference (in seconds), c is the speed of sound (in m/s) and b is the distance between the microphones.
So, how can the brain extract interaural time differences, which - for humans - lie in the range of 625 µs (for an ear distance of 21.5 cm)?
L. A. Jeffres[2] proposed a hypothetical model of how neurons in the brain could make use of these tiny time differences, illustrated in the figure below:
It consists of two major features: axonal delay lines and coincidence detector neurons. The coincidence detector neurons will (simply put) only fire if both of their inputs are excited simultaneously. Due to the finite running times of action potentials, the axons leading to the coincidence detectors act as delays.
If, for example, a sound source is positioned to the left of the head, the action potentials from the left auditory nerve will have time to travel further along the upper (in the picture) axon, before the action potentials from the right ear arrive. Somewhere on the right side of the structure, they will simultaneously excite a coincidence detector neuron, which will then fire. In this way, the interaural time difference has been converted to a location in a neuronal structure.