Digital Sound & Music: Concepts, Applications, & Science, Chapter 2, last updated 6/25/2013
fftdata2 = fft(ybegin);
fftdata2 = fftdata2(1:22050);
plot(freqs, abs(fftdata2));
axis([0 5000 0 4500]);
Figure 2.49 Frequency components of first second of HornsE04Mono.wav
What we've done is focus on one short window of time in applying the FFT. An FFT
window is a contiguous segment of audio samples on which the transform is applied. If you
consider the nature of sound and music, you'll understand why applying the transform to
relatively small windows makes sense. In many of our examples in this book, we generate
segments of sound that consist of one or more frequency components that do not change over
time, like a single pitch note or a single chord being played without change. These sounds are
good for experimenting with the mathematics of digital audio, but they aren't representative of
the music or sounds in our environment, in which the frequencies change constantly. The WAV
file HornsE04Mono.wav serves as a good example. The clip is only three seconds long, but the
first second is very different in frequencies (the pitches of tubas) from the last two seconds (the
pitches of trumpets). When we do the FFT on the entire three seconds, we get a kind of
"blurred" view of the frequency components, because the music actually changes over the three
second period. It makes more sense to look at small segments of time. This is the purpose of the
FFT window.
Figure 2.50 shows an example of how FFT window sizes are used in audio processing
programs. Notice the drop down menu, which gives you a choice of FFT sizes ranging from 32
to 65536 samples. The FFT window size is typically a multiple of 2. If your sampling rate is
44,100 samples per second, then a window size of 32 samples is about 0.0007 s, and a window
size of 65536 is about 1.486 s.
There's a tradeoff in the choice of window size. A small window focuses on the
frequencies present in the sound over a short period of time. However, as mentioned earlier, the
number of frequency components yielded by an FFT of size N is N/2. Thus, for a window size
of, say, 128, only 64 frequency bands are output, these bands spread over the frequencies from 0
Hz to sr/2 Hz where sr is the sampling rate. (See Chapter 5.) For a window size of 65536,
37768 frequency bands are output, which seems like a good thing, except that with the large
window size, the FFT is not isolating a short moment of time. A window size of around 2048
Previous Page Next Page