Andrés Buxó-Lugo
University of Maryland
Encoding and decoding of meaning through structured variability in intonational speech prosody
Online
Speech prosody plays an important role in communication of meaning. However, how listeners use the prosodic signal to arrive at the intended meaning remains to be understood. Prosodic cues vary across talkers and speaking conditions, creating ambiguity in the sound-to-meaning mapping. We hypothesize that listeners ameliorate this ambiguity in part by learning talker-specific statistics of prosodic cues. To test this hypothesis, we investigate the production and recognition of question vs. statement prosody in American English. Experiment 1 elicits productions of questions and statements from 65 talkers to examine the distributional statistics characterizing within- and cross-talker variability in these productions. We use Bayesian ideal observer models to assess the predicted consequences of cross-talker variability on listeners’ recognition of prosody. We find that learning of talker-specific distributional statistics is predicted to facilitate recognition, above and beyond what can be achieved via commonly assumed normalizations of prosodic cues. Experiment 2 tests this prediction in a comprehension experiment. We expose different groups of listeners to different prosodic input statistics, and assess listeners’ recognition of questions and statements before and after exposure. Prior to exposure, ideal observer-derived predictions based on Experiment 1 provide a good qualitative fit against listeners’ recognition of prosodic contours in Experiment 2. Following exposure, listeners shift the categorization boundary between questions and statements in ways consistent with learning of talker-specific statistics. These results suggest that listeners build robust prosodic categories based on their language experience, yet are able to remain flexible and quickly adapt these categories when communicating with talkers that have novel prosody-to-meaning mappings.