Final Conference MEZZANINE

Spoken Language between Research and Technology

September 18, 2025

Faculty of Computer and Information Science, University of Ljubljana
Večna pot 113, 1000 Ljubljana

We cordially invite you to the final conference of the MEZZANINE project. This conference will bring together linguists, technical experts, and researchers from various fields to explore key challenges related to spoken language. You are welcome to participate as an author of your own contribution or as an attendee. The conference is free and open to visitors without prior registration.

Thematic Areas

Spoken Language Resources in Linguistics and Technical Sciences

  • Types of spoken data and their automated collection
  • Needs of different scientific disciplines for spoken data
  • Methods for involving citizens in spoken data collection

Dialectal Variability

  • Spatial distribution of sounds in Slovenian dialects
  • Adaptation of automatic speech recognition for Slovenian dialects

Speech Segmentation and Annotation

  • Development of annotation schemes for speech
  • Self-correction, hesitations, and prosodic features of speech
  • Automatic annotation of morphological and syntactic properties of speech

Spoken Lexicon

  • Automatic processing of the phonetic form of words
  • Extraction of spoken vocabulary for Slovenian dictionaries
  • Differences between spoken and written vocabulary

Program Committee

  • Darinka Verdonik, UM FERI
  • Nikola Ljubešić, IJS

Organizing Committee

  • Špela Antloga, UM FERI 
  • Sara Kos, UL FRI
  • Nejc Robida, UL FF
  • Jaka Čibej, UL FF

Book of Abstracts

The book of abstracts from the conference Spoken Language between Research and Technology brings timely contributions at the intersection of spoken language resources, linguistics, and speech technologies. It features publicly available Croatian childlanguage corpora in CHILDES/TalkBank and the ParlaSpeech V3 collection. Several papers address the creation and processing of Slovenian speech resources: from citizen-science strategies and open-source tools (alignment, anonymization, validation, normalization) to phonetic transcription in the Digital Dictionary Database of Slovene and the expansion of lexical resources with typically spoken vocabulary. The research spans (dis)fluency and filled-pause detection, the relationship between prosodic and syntactic units, and challenges of dialect transcription; a new EPIC-SI early communication corpus is also announced. The volume is open access under the CC BY-SA license and is intended for researchers in linguistics, corpus studies, and speech technologies, as well as the broader professional community.