Technical Papers Library

Library naviation bar

3-D Audio Primer
by Aureal Corporation

This document presents an introduction to the general concepts and performance of three-dimensional audio technology. Several audio technology categories are defined with the purpose of creating a common understanding of "better-than-stereo" audio playback methods.

Contents:

1. INTRODUCTION TO 3-D AUDIO

Since the late 1970's, several audio technologies have been developed to advance the state of the art in audio reproduction beyond stereo. Most of them are focused on increasing the dimensionality of sound playback beyond the one-dimensional stereo sound field created by conventional playback on a left/right speaker pair. Furthermore, the advent of digital audio signal processing has enabled interactive audio experiences: similar to live music, sounds are created on-the-fly based on user input (for example in video games), rather than being based on playback of a pre-recorded soundtrack (as in movies).

A3D from Aureal is a digital audio technology that has been developed to provide maximum performance in both areas of dimensionality and interactivity. A3D technology is based on the principles of binaural human hearing. Binaural means that we hear using two ears. From the two signals that our ears perceive, we can extract enough information to tell where a sound is located in the three dimensional space around us. The functioning of the human hearing system has been researched successfully over the last two decades by psycho-acoustic researchers around the world. They have provided us with the necessary findings and understanding that today’s A3D audio systems are based on.

To put it in simpler terms: since we can hear three-dimensionally in the real world using just two ears, it must be possible to achieve the same effect from just two speakers or a set of headphones. On this basic assumption, 3D audio products have been successfully built.

This document starts by explaining how different forms of audio processing compare against each other ("What is and What isn’t 3D Audio"). It then focuses on the concepts of acoustics and human hearing that A3D is based on, and details the digital audio building blocks that make up an A3D system.

2. WHAT IS AND WHAT ISN'T 3-D AUDIO

As mentioned in the introduction, there are two key pieces to a 3D audio system: 3D positioning and interactivity.

A full-featured 3D audio system provides the ability:

  • To define a three-dimensional space
  • To position multiple sound sources and a listener in that 3D space
  • To do all processing it in real-time, or interactively, for example based on the users inputs in a video game (the opposite of interactive audio playback is a pre-recorded soundtrack).

Certain technologies, namely stereo extension and surround sound, offer some aspects of 3D positioning or interactivity. They are discussed here to explain what applications they are geared towards, and why they are not considered to be part of a new category of technologies, called Positional 3D Audio. This new category combines full 3D positioning and interactivity to offer a new kind of audio listening experience. A3D is the industry leading positional 3D audio technology. A comparison chart of different audio playback methods is included to help differentiate the features of each technology.

    2.1 Extended Stereo

    Extended stereo technologies and products process an existing stereo (two channel) soundtrack to add spaciousness and to make it appear to originate from outside the left/right speaker locations.

    These products are particularly useful to restore stereo performance to low-end PC multimedia sound systems that typically contain low-quality speakers that are placed very closely together. Extended stereo effects can be achieved via various, fairly straight-forward methods. Additionally, their performance is often evaluated based on subjective criteria such as listening tests. For those reasons it is somewhat difficult to compare products in this area. Some of the differentiators include:

    • Size of the listening area (areain which the listener has to be placed withrespect to speakers to hear the effect, alsocalled sweet spot)
    • Amount of spreading of stereo images (more spreading, or user variable spreading, is better)
    • Amount of coloring (tonal changes)of audio content introduced by processing (no coloring is best)
    • Amount of stereo left/rightpanning information that is lost during processing (no panning loss is best)
    • Ability to achieve effect on headphones as well as speakers

    Although sometimes marketed under the name "3D Sound" or "3D stereo" extended stereo technologies are not considered to be 3D audio technologies, because they only offer passive spreading of an existing soundtrack, and not interactive 3D positioning of individual sounds.

    2.2 Surround Sound

    Technologies and products that create a larger-than-stereo sound stage by playing back multi-channel Dolby® or Mpeg surround sound soundtracks on multi-speaker setups. Surround sound is based on using audio compression technology (for example Dolby ProLogic® or Digital AC-3®) to encode and deliver a multi-channel soundtrack, and audio decompression technology to decode the soundtrack for delivery on a surround sound 5-speaker setup. Additionally, virtual surround sound systems use 3D audio technology to create the illusion of five speakers emanating from a regular set of stereo speakers, therefore enabling a surround sound listening experience without the need for a five speaker setup. Aureal's A3D Surround is a Virtual Surround technology.

    Because they are pre-recorded, surround sound soundtracks are most suitable for movies. They are non-interactive, and therefore not particularly useful in interactive software such as video games and Web Sites. Because of their limitations when it comes to interactivity, surround sound systems are not considered for the interactive 3D audio category.

    Ways to evaluate the performance of a surround sound system:

      Physical Speakers
    • Presentation accuracy of individual channels, clarity of spatial imaging (size of sound stage)

      Virtual Speakers
    • Listening comparison to a physical 5-speaker setup (accuracy of virtual to physical speaker mapping, as well as accuracy of reproduction of original soundtrack mix-down)
    • Amount of audio coloring (tonal changes) introduced by processing (no coloring is best)

      Both Physical and Virtual Setups
    • Size of the listening area (area in which the listener has to be placed with respect to speakers to hear the effect, also called sweet spot)

    2.3 Positional 3D Audio (A3d Interactive)

    Positional 3D audio (a.k.a. interactive 3D audio) allows for interactive, on-the-fly positioning of sounds anywhere in the three-dimensional space surrounding a listener. Support for such technologies can be incorporated into software titles such as video games to create a natural, immersive, and interactive audio environment that closely approximates a real-life listening experience. This category can be described as the audio equivalent of 3D graphics. Aureal’s A3D Interactive is a positional 3D audio technology.

    3D audio technologies create a more life-like listening experience by replicating the 3D audio cues that the ears hear in the real world. The following two sections, "The Basics of Acoustics" and "The Basics of Human Hearing", explain what those listening cues are and how they can be reproduced. For maximum flexibility and usability, a 3D audio algorithm should support all possible audio playback environments: headphones, stereo speakers and multi-speaker (surround or quad) arrays. In the case of stereo speakers or headphones more demands are placed on the algorithm and less demands on the end-user, because stereo setups are most common and easy to setup. Multi-speaker arrays require less complex 3D audio rendering algorithms, but put more demands on the end-user’s playback setup (cost and setup complexity of extra amplifiers and speakers). In both cases, the desired 3D effects are controlled by software applications which position 3D sound sources and listeners via an API (Application Programming Interface) such as Microsoft’s DirectSound3D API for the Windows® platform, or the VRML 2.0 standard.

    Ways to evaluate the performance of a 3D interactive sound system:

    • Listening tests to evaluate howwell sounds are projected in all three dimensions(left/right, up/down, front/back), and how much realism they provide
    • Number and quality of softwaretitles that take advantage of 3D technology
    • Number of concurrent 3D soundsources system provides at a given quality or sample rate
    • Ability to achieve effect onheadphones as well as speakers
    • Size of the listening area (areain which the listener has to be placed with respect to speakers to hear the effect, alsocalled sweet spot)
    • Amount of coloring (tonal changes)of audio content introduced by processing (no coloring is best)

    Table: A Comparison of Audio Playback Methods

    2.4 Headphone Versus Stereo Speaker Playback Devices

    In terms of 3D sound processing, these two playback media offer different challenges and advantages. Headphones have the advantage of always being in a known position with respect to the listener’s ears. This means that two separate audio signals (left and right) are guaranteed to go directly into the two ears of a listener. With speakers, this is only the case if the listener is sitting in the ideal listening position, the sweet spot, and processing methods are employed to insure that the left ear does not receive any audio content from the right speaker, and vice versa (cross-talk cancellation).

3. THE BASICS OF ACOUSTICS

Human beings extract a lot of information about their environment using their ears. In order to understand what information can be retrieved from sound, and how exactly it is done, we need to look at how sounds are perceived in the real world. To do so, it is useful to break the acoustics of a real world environment into three components: the sound source, the acoustic environment, and the listener:

Figure 1 - Typical soundfield with a source, environment and listener.

  • The sound source: this is an object in the world that emits sound waves. Examples are anything that makes sound - cars, humans, birds, closing doors, and so on. Sound waves get created through a variety of mechanical processes. Once created, the waves usually get radiated in a certain direction. For example, a mouth radiates more sound energy in the direction that the face is pointing than to side of the face.
  • The acoustic environment: once a sound wave has been emitted, it travels through an environment where several things can happen to it: it gets absorbed by the air (the high frequency waves more so than the low ones. The absorption amount depends on factors like wind and air humidity); it can directly travel to a listener (direct path), bounce off of an object once before it reaches the listener (first order reflected path), bounce twice (second order reflected path), and so on; each time a sound reflects off an object, the material that the object is made of has an effect on how much each frequency component of the sound wave gets absorbed, and how much gets reflected back into the environment; sounds can also pass through objects such as water, or walls; finally, environment geometry like corners, edges, and small openings have complex effects on the physics of sound waves (refraction, scattering).
  • The listener: this is a sound receiving object, typically a "pair of ears". The listener uses acoustic cues to interpret the sound waves that arrive at the ears, and to extract information about the sound sources and the environment.

4. THE BASICS OF HUMAN HEARING

As explained above, people can be considered sound receiving objects in an environment. We have an auditory sensing system consisting of two ears and a brain. Additionally, very low frequency sounds can be sensed through the human body. The brain uses a number of cues that are embedded in the two sound signals it receives from the two ears to learn about the sounds and their environment. Most people are unaware that the effects described in the following sections greatly impact our continuous perception of reality, every day of our lives. On the other hand, there are certain people, for example non-sighted people, that are very much aware of these effects, because they heavily rely on their ears for querying and navigating their surroundings.

    4.1 Primary Localization Cues - IID and ITD

    The two primary localization cues are called interaural intensity difference (IID) and interaural time difference (ITD). IID refers to the fact that a sound is louder at the ear that it is closer to, because the sound’s intensity at that ear will be higher than the intensity at the other ear, which is not only further away, but usually receives a signal that has been shadowed by the listener’s head (see fig. 2). ITD means that a sound will arrive earlier at one ear than the other (unless it is located at exactly the same distance from each ear - for example directly in front). If it arrives at the left ear first, the brain knows that the sound is somewhere to the left (see fig. 3).

    Figure 2 - Illustration of IID.

    Figure 3 - Illustration of ITD.

    The combination of these two cues allows the brain to narrow the position of an individual sound source to somewhere on a cone centered on the line drawn between the listeners ears (see fig.4 ).

    Figure 4 - ITD Cone.

    4.2 The Outer Ear Structure - Pinna

    Before a sound wave gets to the ear drum, it passes through the outer ear structure, called the pinna. The pinna accentuates or suppresses mid- and high-frequency energy (see fig. 5) of a sound wave to various degrees, depending on the angle at which the sound wave hits the pinna (see fig. 6). This means that the two pinnae act as variable filters that effect every sound that passes through them. The brain knows how to figure out the exact location of a sound in space by receiving a signal that has been filtered in a way that is unique to the sound source’s position relative to the listener.

    Figure 5 - Spectrum differences between original and pinna.

    Figure 6 - Pinnae frequency modulation sound source and pinna reception at varying elevations.

    The pinnae are the key to accurately localizing sounds in space. However, since the outer ear and its folds are on the scale of a few centimeters, only sound waves with wavelengths in the centimeter range or smaller can be affected by the pinna. In addition, the two ears are about 15 centimeters apart, so even IID and ITD cues are greatly reduced for wave lengths bigger than that. For example, a 3.3 kHz sound signal oscillates 3300 times per second, while sound travels at about 330 meters per second. The wave length is therefore about 330/3300 = 0.1 meters, or 10 centimeters. This means that a sound at 3300 Hz lies in the area where primary cues are still noticeable, but pinna cues start to be diminished. In general, the higher the frequency of a sound, the shorter its wave length, and the better it can be localized. This phenomena can be verified by placing two speakers, a sub-woofer and a high-frequency tweeter, in a room and playing music through them. With closed eyes you will be able to immediately tell where the tweeter is located, the sub-woofer however will sound like it is "coming from everywhere".

    4.3 Propagation Effects, Range Cues, and Reflections

    Many things happen to a sound as it travels through an environment before it is received by a listener. All of these effects allow us to learn more about what we are hearing and what kind of environment we are in:

    • A somewhat muffled, quiet sound is likely off in the distance (see fig. 7).
    • If it is heavily muffled, we might be in a enclosed space, listening through glass, or other wall materials.
    • The effect of sound reflections in an environment is very important, because we are able to hear the difference in time of arrival and location between the direct path signal, first-order, and n-th order reflections (see fig. 8). The reflections give us a way to further pin-point a sound source’s location, as well as the size, shape and type of room or environment that we are in (people with very "good ears" are able to exactly locate a wall, or tell the difference between a open or closed door, simply by listening to reflections). While humans are capable of individually perceiving first order reflections, second and higher order reflections usually combine to form what are called late field reflections, or reverb.

    Figure 7 - Source attenuation and absorption.

    Figure 8 - Direct path, first and second order due to range (listener-source distance) reflections in a typical room.

5. HOW A3D WORKS

A 3D audio system aims to digitally reproduce a realistic sound field. To achieve the desired effect a system needs to be able to re-create portions or all of the listening cues discussed in the previous chapter: IID, ITD, outer ear effects, and so on. A typical first step to building such a system is to capture the listening cues by analyzing what happens to a single sound as it arrives at a listener from different angles. Once captured, the cues are synthesized in a computer simulation for verification.

6. ADVANTAGES OF A3D AS ILLUSTRATED BY RESEARCH FINDINGS

Results from decades of psycho-acoustic research on binaural audio offer scientific explanations of why real-time binaural audio technologies such as A3D are highly effective in a range of applications.

7. SUMMARY

For well over ten years, real-time binaural, or "3D", audio technology has been the subject of intense research and development in the psycho-acoustic research community. The findings of a large number of research studies indicate that interactive 3D audio is an important technology that enables an entirely new level of audio experience: a three-dimensional sound field is created in real-time to continuously envelop a listener. The listener is no longer aware of the audio system that is rendering the sounds - the application communicates directly with the user, creating levels of awareness, realism, immersion and increases in reaction time and communication of audio information previously only possible in real-life situations.

Besides understanding real world sounds and the hearing process, the biggest challenges associated with building an effective positional 3D audio solution are:

A3D has mastered all of the above challenges. A3D is based on the world’s most advanced algorithms and HRTF measurement and compression techniques, that have been developed in high-performance, mission-critical application areas such as NASA simulators, jet fighter cockpits, and Virtual Reality systems. Aureal has created free software tools, SDKs, and APIs and evangelized them successfully to over 100 top tier PC software development houses. Aureal’s breakthrough Vortex PCI audio chips render A3D on dozens of new sound cards and PCs. Finally, Aureal has created the A3D brand that is actively promoted with a simple message: if you take a software application with the A3D logo on it, and a sound card or PC with the same logo on it, they will combine to deliver the most amazing, immersive and realistic interactive audio experience.



c. 1998, Aureal Corporation.
From
Aureal Corporation Site. (Republished with permission.)
Questions or comments? Visit the HeadWize Discussion Forums.



Site naviation bar

Contact Webmaster - HeadWize welcomes comments and suggestions!

Announcements | Library | Forums | Commercial Links | Educational Links | Headphone FAQs

Contact Webmaster Headphone FAQs Educational Links Commercial Links Forums Homepage Library Announcements & News HeadWize Homepage Design Showcase Technical Papers Library Projects Library Article and Multimedia Library