Digital Sound & Music: Concepts, Applications, & Science, Chapter 5, last updated 6/25/2013

65

understand this better if you imagine that the audio signal is a piece of music that you decompose

into 32 frequency bands. After the decomposition, you could play each band separately and hear

the musical piece, but only those frequencies in the band. The segments would need to be longer

than 1152 samples for you to hear any music, however, since 1152 samples at a sampling rate of

44.1 kHz is only 0.026 seconds of sound.)

4. Use the MDCT to divide each of the 32 frequency bands into 18 subbands for a

total of 576 frequency subbands.

The MDCT, like the Fourier transform, can be used to change audio data from the time

domain to the frequency domain. Its distinction is that it is applied on overlapping windows in

order to minimize the appearance of spurious frequencies that occur because of discontinuities at

window boundaries. (“Spurious frequencies” are frequencies that aren’t really in the audio, but

that are yielded from the transform.) The overlap between successive MDCT windows varies

depending on the information that the psychoacoustical analyzer provides about the nature of the

audio in the frame and band. If there are transients involved, then the window size is shorter for

greater temporal resolution. Otherwise, a larger window is used for greater frequency resolution.

5. Sort the subbands into 22 groups, called scale factor bands, and based on the

SMR, determine a scaling factor for each scale factor band. Use nonuniform quantization

combined with scaling factors to quantize.

Values are raised to the ¾ power before quantization. This yields nonuniform

quantization, aimed at reducing quantization noise for lower amplitude signals, where it has a

more harmful impact.

The psychoacoustical analyzer provides information that is the basis for sorting the

subbands into scale factor bands. The scale factor bands cover several MDCT coefficients and

more closely match the critical bands of the human ear. This is one of the ways in which MP3 is

an improvement over MPEG-1 Layers 1 and 2.

All bands are quantized by dividing by the same value. However, the values in the scale

factor bands can be scaled up or down based on their SMR. Bands that have a lower SMR are

multiplied by larger scaling factors because the quantization error for these bands has less

impact, falling below the masking threshold.

Consider this example. Say that an uncompressed band value is 20,000 and values from all

bands are quantized by dividing by 128 and rounding down. Thus the quantized value would be

156. When the value is restored by multiplying by 128, it is 19,968, for an error of

. Now supposed the psychoacoustical analyzer reveals that this band requires less

precision because of a strong masking tone. Thus, it determines that the band should be scaled

by a factor of 0.1. Now we have ( ) . Restoring the original value we get

, for an error of .

An appropriate psychoacoustical analysis provides scaling factors that increase the

quantization error where it doesn’t matter, in the presence of masking tones. Scale factor bands

effectively allow less precision (i.e., fewer bits) to store values if the resulting quantization error

falls below the audible level. This is one way to reduce the amount of data in the compressed

signal.

6. Encode side information.

Side information is the information needed to decode the rest of the data, including where

the main data begins, whether granule pairs can share scale factors, where scale factors and

Huffman encodings begin, the Huffman table to use, the quantization step, and so forth.