Madhusudana Shashanka

 Home  |  Bio  |  Publications 


Reverberation and Speech

I have been conducting experiments to understand the factors/cues that affect how we process speech from multiple talkers in natural reverberant environments. Below is the abstract we have submitted to the 29th midwinter meeting of the ARO.

"The role of fundamental frequency in segregating and understanding a talker competing with another talker in a reverberant setting"

Everyday reverberation usually does not degrade speech intelligibility. Similarly, it is generally easy to understand a talker competing with one other talker. However, understanding a target talker competing with another talker in modest reverberation can be extraordinarily difficult. This study investigated whether providing a robust fundamental frequency (F0) segregation cue could improve target intelligibility.

Because F0 varies over time in natural speech, reverberation can reduce the efficacy of F0 for separating talkers (e.g., Culling et al., JASA 2003; Darwin and Hukin, JASA 2000). We manipulated the pitch contours of target (five-digit TIDIGIT strings) and masker (sentences from the TIMIT database) to produce robust monotone F0 differences (&Delta F0) that could improve perceptual segregation of target and masker in reverberation. We compared target intelligibility for natural-F0 and monotone speech at a target-to-masker ratio of 0 dB.

In all anechoic conditions, performance was near ceiling. Reverberation degraded performance for natural-F0 speech. Compared to natural-F0 performance, intelligibility in the reverberant monotone conditions was 1) significantly better when the target was one semitone above the masker, 2) essentially equal for &Delta F0 = 0 or 2 semitones, and 3) significantly worse when &Delta F0 was negative. Spectro-temporal overlap between target and masker partial explains this initially puzzling result. Compared to the natural-F0 reverberant case, the percentage of time-frequency bins in which the reverberant monotone target had more energy than the masker was higher when &Delta F0 > 0, equal when &Delta F0 = 0, and lower for &Delta F0 < 0. No such interaction between energetic overlap and pitch contour occurred for anechoic conditions.

These results suggest that both energetic masking and perceptual effects influence how F0 affects segregation and intelligibility in reverberant settings.