Digital Sound & Music: Concepts, Applications, & Science, Chapter 5, last updated 6/25/2013
frequency resolution because of the complexity of the sound in that part of the audio. This can
be implemented by allowing the user to specify the maximum bit rate. An alternative is to allow
the user to specify an average bit rate. Then the bit rate can vary frame-by-frame but is
controlled so that it averages to the chosen rate.
A certain amount of variability is possible even within CBR. This is done by means of a
bit reservoir in frames. A bit reservoir is a created from bits that do not have to be used in a
frame because the audio being encoded is relatively simple. The reservoir provides extra space
in a subsequent frame that may be more complicated and requires more bits for encoding.
Field I in Table 5.4 makes reference to the channel mode. MP3 allows for one mono
channel, two independent mono channels, two stereo channels, and joint stereo mode. Joint
stereo mode takes advantage of the fact that human hearing is less sensitive to the location of
sounds that are the lowest and highest ends of the audible frequency ranges. In a stereo audio
signal, low or high frequencies can be combined into a single mono channel without much
perceptible difference to the listener. The joint stereo option can be specified by the user.
A sketch of the steps in MP3 compression is given in Algorithm 5.3 and the algorithm is
diagrammed in Figure 5.49. This algorithm glosses over details that can vary by implementation
but gives the basic concepts of the compression method.
algorithm MP3 {
/*Input: An audio signal in the time domain
Output: The same audio signal, compressed
Process the audio signal in frames
For each frame {
Use the Fourier transform to transform the time domain data to the frequency domain, sending
the results to the psychoacoustical analyzer {
Based on masking tones and masked frequencies, determine the signal-to-masking noise
ratios (SMR) in areas across the frequency spectrum
Analyze the presence and interactions of transients
Divide the frame into 32 frequency bands
For each frequency band {
Use the modified discrete cosine transform (MDCT) to divide each of the 32 frequency bands
into 18 subbands, for a total of 576 frequency subbands
Sort the subbands into 22 groups, called scale factor bands, and based on the SMR, determine
a scaling factor for each scale factor band
Use nonuniform quantization combined with scaling factors to quantize
Encode side information
Use Huffman encoding on the resulting 576 quantized MDCT coefficients
Put the encoded data into a properly formatted frame in the bit stream
Algorithm 5.3 MP3 compression
Previous Page Next Page