Audio samples produced by TubeTalker
The audio files below coincide with this paper:
The speech samples were produced by speech simulation system called TubeTalker. TubeTalker operates at the level of the vocal tract area function on the theoretical view that speech is produced by multiple levels of airway structure and modulation. A "neutral" vocal tract shape is the base structure on which all other modulation is superimposed. The first level of modulation consists of time-dependent shaping of the neutral tract shape over most of its length; this produces transitions from one to another. Spatially localized perturbations are imposed in the second level of modulation that momentarily perturb the underlying vowel substrate. The examples below are demonstrations of using TubeTalker to generate speech at the word and phrase levels.
This sample is the neutral vocal tract only. The voice source does produce a fundamental frequency (F0) contour to give samples a more natural quality. The F0 contour is identical for all samples.
With regard to the vocal tract, "Ohio" is an all-vowel utterance. It was generated by modulating the neutral vocal tract shape such that it produced acoustic characteristics of the vowels. The glottal aspiration for the "h" sound was creating by an adbuctory maneuver of the vocal folds.
This word requires modulation at the level of vowel transitions and consonantal perturbations. The sample below, however, is of only the vowel transitions that underlie production of the word.
Now the consonantal perturbations are imposed on vowel substrate to produce the "Abracadabra."
This sample demonstrates increased complexity due to it being a phrase rather than a word. This audio file, however, is only the vowel substrate on which phrase is built.
The consonantal perturbations are now imposed. Note that an "r" is present in this phrase which requires that consonant perturbation not occlude the vocal tract.
This audio file demonstrates the vowel substrate for the phrase.
The unique component of this example is that it includes a nasal consonant. This requires that the area of the nasal port that couples the main vocal tract to the nasal passages/sinuses be precisely timed to allow nasalization, but also to terminate quickly for adequate production of the "k" sound in the following word ("cow").
Modifications to the neutral vocal tract shape
By changing only the neutral vocal tract shape while keeping all other modulations the same, a new sound quality is produced. Here are two examples using the "He had a rabbit" phrase.
Modifications to the timing of the control parameters
In the following two phrases, the timing of all control parameters was altered such that the first half of each phrase was increased in duration by 25 percent and the latter half decreased by 25 percent. The total duration of each phrase is the same as the original.
In the following two phrases, the timing of all control parameters was altered such that the first half of each phrase was decreased in duration by 25 percent and the latter half increased by 25 percent. The total duration of each phrase is again the same as the original.
Modifications to the voice source
In the following two phrases, the baseline separation of the vocal processes was increased from 0.1 cm to 0.15 cm. This change has the effect of allowing a greater non-oscillatory component of the glottal flow during voicing, and results in increased glottal turbulence. The perceptual effect is a breathier voice quality.
Modifications to the nasal coupling parameters (hypernasal)
In the following two phrases, the nasal coupling area was maintained at a minimum value of 0.2cm2 throughout the duration of each phrase. The effect is to nasalize all portions of the phrases resulting in a hypernasal quality.
Modifications to the epilaryngeal tube
In the following two phrases, the entry area to the vocal tract was increased to effectively widen the epilaryngeal tube. This modification alters the voice quality in two ways - the first three formants are shifted slightly downward in frequency and the glottal flow waveform is altered. The perceptual effect is a darker voice quality.
Modifications to the epilaryngeal tube and increase in vocal tract length
In the following two phrases, the entry area to the vocal tract was increased as in the previous example. In addition, the vocal tract length was increased to 18.5 cm.
Extra modifications not in the published paper
This version of "abracadabra" has increased duration, decreased fundamental frequency, widened epilarynx, and the vocal tract length was increased to 18.5 cm.
This is the same as the sample above, but has an added vocal tremor.