Final conference MEZZANINE

Round Table: Speech Resources and Speech Technologies in Research

Ljubljana, Faculty of Computer and Information Science, 18 September 2025

Moderator: Nikola Ljubešić

Participants: Andrej Žgank, Marko Bajec, Darinka Verdonik, Gordana Hržica, Simon Dobrišek, Karmen Kenda Jež, and other participants of the conference Spoken Language between Research and Technology

Discussion highlights

  • Scope and type of data: both large general collections and specialized corpora are needed (e.g., child speech, dialects, spontaneous conversations).
  • Data accessibility: in addition to speech corpora published in the CLARIN.SI repository, a national archive of spoken language from the media should be established.
  • Technological support: essential tools include facilities for data collection, automatic segmentation, classification and transcription, anonymization, and diverse methods of speech data analysis.
  • Legal and ethical issues: use of recordings must respect speakers’ privacy and comply with legislation (copyright, personal data protection).
  • Interdisciplinary collaboration: linguists, computer scientists, speech therapists,  psychologists, political scientists and other researchers should cooperate to ensure broad usability of the data.

Conclusion: All participants emphasized that speech data are crucial for the future of linguistic research, speech therapy, and speech technologies, and that closer interdisciplinary and international collaboration is essential.