Dec
Mock defense: Auditory and Neural Dynamics of Predictive Speech Perception
Ph.D. Candidate: Tugba Lulaci, Mock Opponent: Johan Frid
Speech perception rarely occurs under ideal conditions. Listeners must navigate an ambiguous, masked or rapidly unfolding speech signal in order to comprehend spoken language. While prediction has been widely discussed and acknowledged in speech perception, less is known about how listeners predict upcoming information when the cues are limited or ambiguous, particularly at the earliest stages of perception. This thesis investigated how listeners anticipate upcoming sounds and update these predictions during speech perception, focusing on rapid acoustic cues in the unfolding speech signal. It also explored how these processes are influenced by individual extended high-frequency hearing sensitivity, background noise and spectrotemporal dynamics in the signal. By combining behavioral tasks, EEG and audiological assessments, the thesis traced speech processing from acoustic detail to auditory perception and neural activity in the cortex. The first study used a gating paradigm to test how early acoustic cues can be used predictively to anticipate upcoming sounds in spoken-word comprehension using the earliest available acoustic information. Results showed that listeners could predict the upcoming sounds and, finally, identify the words using only partial short gates of onset phonemes as early as 15 ms into the word onset. The second study tested whether this ability was affected by hearing sensitivity, focusing on extended high-frequency hearing thresholds. Individuals with better extended high-frequency hearing thresholds made more efficient use of predictive cues, suggesting that extended high-frequency hearing influences how efficiently listeners can perceive and use fine-grained acoustic information predictively. The third study investigated whether prosodic cues, specifically Swedish word accents, remain reliable when the speech signal is masked by speech-shaped noise. In mismatch trials, N400 and P600 responses persisted under adverse listening conditions. However, their topographies shifted and the amplitude weakened, suggesting that noise affected the predictive use of prosodic cues but did not eliminate them. Furthermore, a pre-activation negativity (PrAN) was observed solely in the highest-noise-masked listening condition (SNR–5 dB), suggesting that the pre-activation indexed a functionally adaptive mechanism to support comprehension under increased uncertainty. The fourth study examined how coarticulatory cues influence real-time neural dynamics of prediction. Cross-spliced stimuli were used to create match and mismatch conditions between the word-onset fricatives and the vowel+coda. The results highlighted that when fine-grained cues signaling the upcoming sound were accessible in the word onset, the brain may register them rapidly, as early as ~ 45 ms. Coarticulatory cues were used predictively. That is, mismatch between the onset and the rest of the word elicited different neural responses across fricatives: /s/ onset words showed a phonological mapping negativity (PMN), N400 and late positivity, whereas /f/ onset words evoked a P300 response. This pattern suggested that listeners were sensitive to fine-grained coarticulatory cues and that the processing was affected by the spectral properties and accessible cues in the unfolding speech signal. Taken together, across studies, the findings suggested that listeners used rapid acoustic cues to anticipate upcoming speech sounds and that the differences between these predictions were affected by what listeners could access from the signal. By examining cue-based prediction across paradigms and across behavioral, auditory and neural levels of analysis, the thesis offers insights into how speech perception unfolds over time, influenced by both the signal itself and the listener who hears it.
