Method for estimating pitch independently from power spectrum envelope
for speech and musical signal
Yoshifumi Hara, Mitsuo Matsumoto and Kazunori Miyoshi
Pitch and changes in pitch are primary characteristics of a speech signal. Since a speech signal is a quasi-periodic signal, stability and accuracy are
required to a pitch estimation method. Various methods for extracting periods of a speech signal in the time domain and for analyzing the
microstructure of the spectrum in the frequency domain have been proposed. Auto-correlation function (ACF) and its applications are well known
methods to be applied to detect periodicity of a speech signal in the time domain and are known to be robust to noise. ACF in the time domain is
equivalent to the power spectrum in the frequency domain. Therefore, pitch estimated by ACF is subject to the power spectrum of the speech signal.
This paper proposes a method for applying ACF to detect periodicity of the microstructure of the spectrum in the frequency domain, independently
from the power spectrum envelope. First, divide a speech signal into a set of frames. Second, in each frame, picking up major local peaks of the
amplitude frequency characteristics for a speech signal in the frame in the frequency domain. Third, represent the amplitude frequency characteristics
as a sequence of unity impulses, which is a line spectrum. Locations of the impulses on the frequency axis are those of the local peaks. Finally, apply
ACF to the sequence for extracting periods of the impulses on the frequency axis. And estimate pitch with the periods. Since pitch estimated by this
method is free from the power spectrum envelope of a speech signal, the method has stability and accuracy. Furthermore, in this method, because
simplified ACF is applicable to a line spectrum, the method is advantageous for computing complexity
Key words: Pitch estimation, Autocorrelation, Power spectrum, Line spectrum, Peak-picking