[This work is based on collaborations with Zach Burchill, Dave Kleinschmidt, Crystal Lee, Linda Liu, Lauren Oey, Emily Simon, Kodi Weatherholtz, Xin Xie, and funded by NIH R01 HD075797-01]

Talkers differ in how they pronounce the same words. This causes the statistics of the speech signal to be non-stationary, creating a formidable computational challenge to speech perception (where this problem is known as part of the infamous lack of invariance). The same property of speech continues to cause tremendous problems to automatic speech recognition systems.

Work over the last decade suggests that listeners overcome this problem partly by learning generalization across types of talkers. These studies have found that listeners seem to be able to distinguish between individual variability (within-talker variance) and systematic variability across individuals that are members of the same group (e.g., foreign-accented talkers of specific language background). This research also suggests that exposure to multiple talkers is required for successful generalization to novel talkers of the same accent.

However, research on cross-talker generalization so far rest on very few studies (less than a handful), often underpowered, some of which have returned null results. In this talk, I’ll focus on efforts from my lab to contribute to this line of research via large-scale web-based studies on the perception of foreign accents. I present evidence for rapid adaptation to talker-specific foreign-accented speech. I then show both successes and failures to find generalization of this adaptation to a novel talker of the same accent. We find that, on the one hand, listeners sometimes fail to detect generalizations across groups of talkers, even after exposure to multiple talkers of the same accent (contrary to previous work); and, on the other hand, listeners sometimes successfully generalize to novel talkers of the same accent, even after exposure to only one talker (also contrary to previous work). In fact, listeners seem to generalize based on the perceived similarity between talkers.

I close by discussing open questions about the learning mechanisms that underlie our (impressive but limited) ability to accommodate inter-talker differences.

Key references

Liu L., Xie, X., Weatherholtz, K., Burchill, Z., and Jaeger, T. F. 2017. Cross-talker generalization during accept adaptation.

Xie, X., Weatherholtz, K., Bainton, L., Rowe, E., Burchill, Z., Liu, L., and Jaeger, T. F. 2017. Rapid adaptation to foreign-accented speech and its limits.