website banner

Slot 1 Tutorials (8:30-11:30)

  • Option 1: "Speech Technology Meets Early Language Acquisition: How Interdisciplinary Efforts Benefit Both Fields", by Maureen de Seyssel (Apple), Emmanuel Dupoux (École Normale Supérieure, Paris & Meta), Okko Rasanen (Tampere University)

    • Short description: Recent advances in speech technologies have revolutionised the tools available for psycholinguistic research, especially in the study of language development. At the same time, tackling challenges related to language acquisition and human-like speech perception provides unique opportunities to advance machine learning and speech technologies. This tutorial will explore how the latest innovations in speech processing, and specifically in self-supervised learning, can transform our approach to studying language processing and acquisition. We will present a framework for using these models to create realistic learning simulations, alongside practical examples of how they have been successfully applied to test and devise hypotheses for early language acquisition, underlining the impact speech processing models can have on this domain of psycholinguistics. We will also examine how addressing these psycholinguistic challenges can, in turn, drive advancements in speech processing and inspire the development of better models. Future pathways and open research questions will be discussed to encourage participants to envision new opportunities at the intersection of these fields. This session is designed to equip researchers in speech processing with the tools and conceptual knowledge to contribute to cutting-edge research in psycholinguistics, with the expectation that this will also drive state-of-the-art advancements in speech processing.

    • About tutorial organizers:

      • Maureen de Seyssel is a research scientist in Apple’s Machine Learning Research team, and is based in Copenhagen, Denmark. She obtained her PhD in 2023 from the École Normale Supérieure in Paris, France. Her broad research interests lie at the intersection of cognitive science and machine learning, with a primary focus on the language acquisition process from both machine and human perspectives.

      • Emmanuel Dupoux is a Professor of Cognitive Psychology at the École des Hautes Études en Sciences Sociales (EHESS) in Paris. He earned his Ph.D. in Cognitive Psychology from EHESS in 1989, focusing on the mechanisms and representations that enable infants to acquire language and become cognitively functional within their culture. His research expertise lies in cognitive development, psycholinguistics, language acquisition, cognitive modeling, and machine learning. He specializes in studying early language acquisition, phonological ‘deafnesses’ in speech perception, and the development of social cognition. He also investigates how machine learning and artificial intelligence can provide quantitative models of processing and learning in infants. Throughout his career, he has received several awards and honors, including an Advanced ERC grant and the organization of the Zero Resource Speech Challenge (2015, 2017, 2019) and the Intuitive Physics Benchmark (2019).

      • Okko Räsänen is a Professor at the Signal Processing Research Centre of Tampere University, Finland. He defended his PhD on computational modeling of language acquisition in 2013 at Aalto University, Finland. Since 2018, he has been with Tampere University. He was a visiting researcher at the Language and Cognition Lab at Stanford University in 2015, and he holds the title of Docent in Spoken Language Processing at the School of Electrical Engineering of Aalto University. His research focuses on investigating early language development using computational modeling and on supporting early language research with speech processing and machine learning technologies. In addition to modeling human language learning and speech perception, his interests cover a broad range of other topics related to cognitive science, machine learning and speech technology. He has published more than 110 peer-reviewed papers in both technology and psychology journals and conferences.

  • Option 2: "Confidence Estimation for Trustworthy and Efficient Speech Systems", by Nagarathna Ravi (Big Data Research and Supercomputing Division, India), Thishyan Raj T (Department of Electrical Engineering, IIT Kanpur), Aditya Raj (Department of Electrical Engineering, IIT Kanpur), Vipul Arora (Department of Electrical Engineering, IIT Kanpur)

    • Short description: Estimating uncertainty in outputs can enhance the trustworthiness of AI systems. A calibrated classification model, apart from giving an output, gives a confidence value in its output that approximates the expected accuracy of the model for that output. A calibrated regression model, on the other hand, gives precise confidence intervals of the output. State-of-the-art (SOTA) approaches focus on calibrating model outputs or developing auxiliary models to estimate the confidence in model predictions. Two main types of uncertainty are aleatoric, i.e., arising from the stochastic mapping between the input and the output, and epistemic, stemming from the limitations in the model’s knowledge. Estimating them separately opens up further opportunities. Trustworthy speech systems based on confidence estimation can enhance various tasks, such as automatic speech recognition (ASR), speech enhancement and speaker diarization. The confidence estimates can help in decision making, enhancing performance and active learning, thereby making learning efficient in low-resource and atypical settings. This tutorial will cover the theoretical foundations and SOTA methods for uncertainty estimation and confidence calibration. This will include a deeper dive into confidence calibration for end-to-end ASR. Finally, we will present some applications of confidence calibration for low-resource ASR and music analysis.

    • About tutorial organizers:

      • Vipul Arora received the B.Tech. and Ph.D. degrees in electrical engineering from the Indian Institute of Technology (IIT) Kanpur, India, in 2009 and 2015, respectively. He has been a Postdoctoral Researcher at the University of Oxford and a Research Scientist at Amazon Alexa, Boston, USA. He is currently an Associate Professor in the Department of Electrical Engineering, IIT Kanpur. His research interests include machine learning, audio processing, machine learning for physics, and time series analysis.

      • Nagarathna Ravi received the B.Tech. degree in information technology from the Madras Institute of Technology, Anna University, India, and the M.E. degree in computer science engineering from the Thiagarajar College of Engineering, India. She received the Ph.D. degree from Anna University, India. She was a Postdoctoral Researcher at IIT Kanpur, India. She is currently a senior scientist at CSIR-4PI, Bangalore. Her research interests are ASR, SDN and IoT.

      • Thishyan Raj T received his B.E. degree in Electronics and Communications Engineering from the PESIT Bangalore South Campus. He is currently pursuing M.S. degree in Department of Electrical Engineering, IIT Kanpur. His research interests are in machine learning, signal processing and ASR.

      • Aditya Raj received his B.S degree in Data science and Application from IIT Madras. He is currently pursuing M.Tech degree in Department of Electrical Engineering, IIT Kanpur. His research interests are in Machine Learning, ASR, Signal Processing.

  • Option 3: "Extracting insights from your complex data: interpretable statistical methods in speech science", by Tyson Barrett (Utah State University & Highmark Health), Tristan J. Mahr (University of Wisconsin-Madison), Visar Berisha (Arizona State University), Camille Wynne (University of Houston)

    • Short description: Although speech science is a diverse field, many of its research questions and data types overlap, allowing the same statistical analysis methods to address a wide range of topics. This proposed tutorial will prepare researchers in speech science to understand and apply advanced statistical approaches that produce interpretable output, even with complex data. It is designed for researchers and clinicians with basic statistical knowledge (descriptive statistics, hypothesis testing, linear regression), some programming experience (e.g., R, python), experience with common speech science tools and formats, and familiarity with speech feature extraction. Topics covered will include an introduction to causal inference and statistical modeling, introduction to Bayesian inference, linear regression, mixed effects models, and non-linear methods like generalized additive models. The focus of the presentation will be on applied methodologies (rather than underlying mathematical theory) and will rely on visuals, diagrams, examples, and case studies to provide a more intuitive understanding of the concepts

    • About tutorial organizers:

      • Tyson Barrett, PhD is a researcher at Utah State University and Highmark Health. He is an applied statistician, with publications across several disciplines applying advanced statistical methods to assess complex data. He is the statistician on several grants, including two NIH R01 grants (PI: Stephanie Borrie) and an R21 grant. Dr. Barrett has previously taught courses, tutorials, and workshops on statistical methods across speech, disability, and health disciplines. He maintains several R packages, including data.table, furniture, and tidyfast packages.

      • Tristan Mahr is a scientist at the University of Wisconsin-Madison studying speech and intelligibility in typically developing children and children with cerebral palsy. He develops sophisticated statistical models to characterize growth and variability in these populations. Dr. Mahr has lectured on Bayesian regression and written several tutorials on Bayesian regression, GAMs and mixed effects models. He has also contributed to bayesplot, an R package for visualizing (Bayesian) posterior distributions.

      • Visar Berisha, PhD is a Professor at Arizona State University, with a joint appointment in the College of Engineering and the College of Health Solutions. His work has led to development and translation of clinical tools for analysis of speech that are used by large healthcare providers for clinical research and clinical care. He was the 2023-2024 ISCA Distinguished Lecturer. Dr. Berisha has previously presented tutorials at IEEE conferences (in signal processing), at Interspeech, to industry, and at the American Speech and Hearing Association (on clinical speech analytics).

      • Camille Wynn is an Assistant Professor at the University of Houston where she studies the role of speech coordination on conversational outcomes of neurotypical and autistic adolescents. Dr. Wynn’s research uses advanced statistical approaches to analyze complex acoustic and conversational data, and she teaches statistical courses to PhD students in the social sciences.

Interspeech 2025

PCO: TU Delft Events

Delft University of Technology

Communication Department

Prometheusplein 1

2628 ZC Delft

The Netherlands

Email: pco@interspeech2025.org

X (formerly Twitter): @ISCAInterspeech

Bluesky: @interspeech.bsky.social

Interspeech 2025 is working under the privacy policy of TU Delft