Final Conference MEZZANINE

Spoken Language between Research and Technology

September 18, 2025

Faculty of Computer and Information Science, University of Ljubljana
Večna pot 113, 1000 Ljubljana

We cordially invite you to the final conference of the MEZZANINE project. This conference will bring together linguists, technical experts, and researchers from various fields to explore key challenges related to spoken language. You are welcome to participate as an author of your own contribution or as an attendee. The conference is free and open to visitors without prior registration.

Thematic Areas

Spoken Language Resources in Linguistics and Technical Sciences

Types of spoken data and their automated collection
Needs of different scientific disciplines for spoken data
Methods for involving citizens in spoken data collection

Dialectal Variability

Spatial distribution of sounds in Slovenian dialects
Adaptation of automatic speech recognition for Slovenian dialects

Speech Segmentation and Annotation

Development of annotation schemes for speech
Self-correction, hesitations, and prosodic features of speech
Automatic annotation of morphological and syntactic properties of speech

Spoken Lexicon

Automatic processing of the phonetic form of words
Extraction of spoken vocabulary for Slovenian dictionaries
Differences between spoken and written vocabulary

Program Committee

Darinka Verdonik, UM FERI
Nikola Ljubešić, IJS

Organizing Committee

Špela Antloga, UM FERI
Sara Kos, UL FRI
Nejc Robida, UL FF
Jaka Čibej, UL FF

Book of Abstracts

The book of abstracts from the conference Spoken Language between Research and Technology brings timely contributions at the intersection of spoken language resources, linguistics, and speech technologies. It features publicly available Croatian childlanguage corpora in CHILDES/TalkBank and the ParlaSpeech V3 collection. Several papers address the creation and processing of Slovenian speech resources: from citizen-science strategies and open-source tools (alignment, anonymization, validation, normalization) to phonetic transcription in the Digital Dictionary Database of Slovene and the expansion of lexical resources with typically spoken vocabulary. The research spans (dis)fluency and filled-pause detection, the relationship between prosodic and syntactic units, and challenges of dialect transcription; a new EPIC-SI early communication corpus is also announced. The volume is open access under the CC BY-SA license and is intended for researchers in linguistics, corpus studies, and speech technologies, as well as the broader professional community.

Show