A Listening Machine
Our brain learns to associate a sound that we hear with the direction of its source, how far away the source is, and what it is. Even if we ignore what we can see, or feel, we are able to localize sources by using only two streams of data, one coming in at each ear. A similar situation arises when we use earphones to listen to a binaural recording. However, any binaural recording cannot produce a surround sound effect for every listener (as detailed below.) My project (ECE 3951/2) aims at resolving this problem through digital signal processing.
Any sound emanating from a source gets modified (or filtered) before it reaches our ears. Our ears, clothes, shoulders, hair, the shape of our heads and the room we are in – all contribute to a filter that modifies a sound before it can be heard. This filtration produces localization cues that are brains have learned to recognize. So, each brain would be looking for cues in the sound it hears, in order to extract information about the source. However, every brain would be searching for different cues, as each individual has a unique phenotype, which affects an audio signal in different ways. We can still determine what these cues could be for someone.
Two mics can be placed inside our ears to record such filtered sound signals (creating a binaural recording) and these signals can be processed to determine the frequency response of the room/body (Head-Related Impulse Response, or HRIR). Adaptive filtration techniques have been successfully used to do this. Such a room response can then be convolved with any recorded sound to add a surround sound effect to music, videogame and movie tracks – but such an effect would work best only for the person who made the binaural recordings in the first place, as each individual’s HRIR is unique.
One way to make virtual surround sound accessible to everyone is to record and process everyone’s HRIR individually. My goal, however, is to implement a neural network that can find patterns in a given set of HRIRs, and can use this information to generate more HRIRs (introducing some randomness into the process). This way, a large number of HRIRs could be generated, that may work for a large set of people. In this way, different HRIRs could be applied to different parts of one sound signal, with the hope that every listener’s brain would be able to recognize at least some of them – resulting in successful localization of a virtual source.
Artificial neural networks that recognize handwriting have been implemented and they work well. In the context of sound, artificial neural networks are also used in voice recognition applications. Therefore, my goal is to learn from these applications and implement them to create a ‘listening machine’ network, which would generate HRIRs for each user to simulate sources at various locations around them. If this works out, some sensors could be added to implement interactive surround sound. This project, if successful, might become an interesting recreational tool!