Digital Sound & Music: Concepts, Applications, & Science, Chapter 5, last updated 6/25/2013
understand this better if you imagine that the audio signal is a piece of music that you decompose
into 32 frequency bands. After the decomposition, you could play each band separately and hear
the musical piece, but only those frequencies in the band. The segments would need to be longer
than 1152 samples for you to hear any music, however, since 1152 samples at a sampling rate of
44.1 kHz is only 0.026 seconds of sound.)
4. Use the MDCT to divide each of the 32 frequency bands into 18 subbands for a
total of 576 frequency subbands.
The MDCT, like the Fourier transform, can be used to change audio data from the time
domain to the frequency domain. Its distinction is that it is applied on overlapping windows in
order to minimize the appearance of spurious frequencies that occur because of discontinuities at
window boundaries. (“Spurious frequencies” are frequencies that aren’t really in the audio, but
that are yielded from the transform.) The overlap between successive MDCT windows varies
depending on the information that the psychoacoustical analyzer provides about the nature of the
audio in the frame and band. If there are transients involved, then the window size is shorter for
greater temporal resolution. Otherwise, a larger window is used for greater frequency resolution.
5. Sort the subbands into 22 groups, called scale factor bands, and based on the
SMR, determine a scaling factor for each scale factor band. Use nonuniform quantization
combined with scaling factors to quantize.
Values are raised to the ¾ power before quantization. This yields nonuniform
quantization, aimed at reducing quantization noise for lower amplitude signals, where it has a
more harmful impact.
The psychoacoustical analyzer provides information that is the basis for sorting the
subbands into scale factor bands. The scale factor bands cover several MDCT coefficients and
more closely match the critical bands of the human ear. This is one of the ways in which MP3 is
an improvement over MPEG-1 Layers 1 and 2.
All bands are quantized by dividing by the same value. However, the values in the scale
factor bands can be scaled up or down based on their SMR. Bands that have a lower SMR are
multiplied by larger scaling factors because the quantization error for these bands has less
impact, falling below the masking threshold.
Consider this example. Say that an uncompressed band value is 20,000 and values from all
bands are quantized by dividing by 128 and rounding down. Thus the quantized value would be
156. When the value is restored by multiplying by 128, it is 19,968, for an error of
. Now supposed the psychoacoustical analyzer reveals that this band requires less
precision because of a strong masking tone. Thus, it determines that the band should be scaled
by a factor of 0.1. Now we have ( ) . Restoring the original value we get
, for an error of .
An appropriate psychoacoustical analysis provides scaling factors that increase the
quantization error where it doesn’t matter, in the presence of masking tones. Scale factor bands
effectively allow less precision (i.e., fewer bits) to store values if the resulting quantization error
falls below the audible level. This is one way to reduce the amount of data in the compressed
6. Encode side information.
Side information is the information needed to decode the rest of the data, including where
the main data begins, whether granule pairs can share scale factors, where scale factors and
Huffman encodings begin, the Huffman table to use, the quantization step, and so forth.