
The planned results of the project MEZZANINE include original scientific publications, technical reports/guidelines, applications, datasets and corpora.

Special Issue

Special issue of the Slovenščina 2.0: Empirical, Applied, and Interdisciplinary Research

Investigating Spoken Language: Methodologies, Data, and Technological Solutions

The development of spoken language resources and technologies for Slovene requires a well-considered, interdisciplinary approach that integrates linguistic and technological expertise. Although significant breakthroughs have been achieved in speech technology over the past decade, the specific linguistic characteristics of Slovene are often not reflected in solutions developed for languages with extensive spoken resources. This special issue of the scientific journal Slovenščina 2.0 is dedicated to research that advances the understanding and resolution of key challenges in the development of spoken language resources and technologies for Slovene and other technologically under-resourced languages.

We invite contributions related to the four key thematic areas:

  • Spoken language resources in linguistics and technical sciences: types of spoken data, methodologies for data collection and transcription, citizen engagement in data collection, and automatic transcription of spoken texts.
  • Dialectal variability: spatial distribution of dialectal sounds, automatic recognition of dialectal spoken Slovene, and challenges in defining an optimal phoneme inventory.
  • Segmentation and annotation of speech: delineation of fundamental speech units, annotation of self-repairs, morphological and syntactic properties, and identification of dialogue acts and speaker intentions.
  • Spoken lexicon: automatic extraction of spoken vocabulary from speech corpora, phonetic processing of spoken words, and integration of spoken vocabulary into lexical resources.

Submissions may focus on methodological approaches, empirical research findings, technological solutions, or interdisciplinary connections between linguistics, computational language processing, and spoken language technologies. Manuscripts may be submitted in Slovene or English.


  • April 3, 2025 – Deadline for submitting abstracts (150–300 words)
  • April 7, 2025 – Notification of acceptance or rejection of abstracts
  • June 2, 2025 – Deadline for submitting full papers
  • July 28, 2025 – Notification of acceptance or rejection of full papers
  • December 2025 – Publication of the special issue

Please send summaries, which should contain the following information, to the e-mail address: ali/in

Abstract information

  • text (up to 300 characters)
  • author
  • email
  • institution
  • title of paper

Editors of the Special Issue
Helena Dobrovoljc 
Simon Krek


The 1st Internal MEZZANINE Workshop

Venue & date: ZRC SAZU, Ljubljana, April 14th, 2023, 9:00 AM till 1:00 PM

Participants: Project MEZZANINE researchers

Contents: The Slovenian Linguistic Atlas as a data source of the locational distribution displaying the Slovenian non-standard phones, novelties of the chapter Glasoslovni oris v Pravopisu 8.0. (An outline of Phonetics in Orthography 8.0), Prosody analyses in Praat, Sloleks and spoken Slovenian, Automatic speech recognition, Annotation of spoken language resources.

The 6th International Scientific Conference Slavic Scientific Considerations: Infrastructure for Spoken Language Research in Humanities and Language Technologies

Venue & date: Faculty of Arts, University of Maribor, May 18th-19th, 2023

Participants: Invited lecturers Mojca Smolej, Vesna Mikolič, Radovan Garabik, Peter Jurgec, project MEZZANINE researchers, and other researchers of the Slovenian and South Slavic languages

Content: See the book of abstracts (

SemDial – MariLogue, The 27th Workshop on the Semantics and Pragmatics of Dialogue

Venue & date: Faculty of Electrical Engineering and Computer Science, University of Maribor, August 16th-17th, 2023

Participants: Researchers in the semantics and pragmatics of dialogue from everywhere

Content: See the book of abstracts (

The 2nd Internal MEZZANINE Workshop

Venue & date: Faculty of Computer and Information Science, University of Ljubljana, September 9th, 9:30 AM till 2:00 PM

Participants: Project MEZZANINE researchers

Content: Spoken language resources Artur, Gos 2.0, and the spoken Slovene training corpus; lemmatization and annotation of MSD in spoken language resources; segmentation of spoken language into basic units; syntactic annotations of spoken language; phoneme and word segmentation processes for prosodic analysis

The 3rd Internal MEZZANINE Workshop

Venue & date: Faculty of Computer and Information Science, University of Ljubljana, February 13th, 9:00 AM till 2:00 PM

Participants: Project MEZZANINE researchers

Content: disfluency annotation in the corpus Iriss with the tool Exmaralda; using automatic segmentation tools, speaker recognition, and speech recognition tools for semi-automatic speech transcription; ParlaSpeech; corpus tags for the description of speech event context; the phonetic module of the geolinguistic application DIAtlas; the automatic segmentation of the corpus Gos 2.1 and the use of machine acoustic measurements

Expert panel ‘Frontiers in Speech Communication Research’

The event took place at the Language Technologies and Digital Humanities 2024 conference on September 19 at the Faculty of Electrical Engineering in Ljubljana. Recording and more information (

The 4th Internal MEZZANINE Workshop

Venue & date: Jožef Stefan Institute, October 22nd 2024, 9:30 AM till 13:30 PM

Participants: Project MEZZANINE researchers

Content: training corpus ROG – presentation of different levels of manual annotation; automatic detection of filled pauses, digital lexical database

1.20 Preface, editorial, afterword

9. KRAJNC IVIČ, Mira. Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov = Infrastructure for speech research in the humanities and language technologies : book of abstracts. V: KRAJNC IVIČ, Mira (ur.). Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov : [6. mednarodna znanstvena konferenca Slavistični znanstveni premisleki : 18. 5.-19. 5. 2023, Maribor, Slovenija]. 1. izd. Maribor: Univerza v Mariboru, Univerzitetna založba, 2023. Str. 137-138. ISBN 978-961-286-735-5. [COBISS.SI-ID 165772035]

3.15 Unpublished conference contribution

29. MAJHENIČ, Simona. Cognitive discourse markers in simultaneous interpreting : predavanje na konferenco na Université Paris Cité z naslovom “Discourse Markers – Theories and Methods”, Pariz, Francija, 25. 5. 2023. [COBISS.SI-ID 174052355]
KRAJNC IVIČ, Mira (urednik). Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov : [6. mednarodna znanstvena konferenca Slavistični znanstveni premisleki : 18. 5.-19. 5. 2023, Maribor, Slovenija]. 1. izd. Maribor: Univerza v Mariboru, Univerzitetna založba, 2023. 1 spletni vir (1 datoteka PDF (IV, 136 str.). ISBN 978-961-286-735-5.,,, DOI: 10.18690/um.ff.5.2023. [COBISS.SI-ID 150988291]

LÜCKING, Andy (urednik), MAZZOCCONI, Chiara (urednik), VERDONIK, Darinka (urednik). SemDial 2023 : MariLogue : proceedings of the 27th Workshop on the Semantics and Pragmatics of Dialogue : held at University of Maribor, Faculty of Electrical Engineering and Computer Science, the Internet, August 16–17 2023. Maribor: University of Maribor, Faculty of Electrical Engineering and Computer Science, 2023. 1 spletni vir (1 datoteka PDF (VIII, 180 str.)), ilustr. Proceedings (SemDial). ISSN 2308-2275. [COBISS.SI-ID 167897859]


VERDONIK, Darinka. Označevanje netekočnosti v govoru: primer označevanja z uporabo orodja Exmaralda. Maribor: Univerza, Fakulteta za elektrotehniko, računalništvo in informatiko, 2024. 24 str., pril., [COBISS.SI-ID 191164931]

VERDONIK, Darinka, GOSTENČNIK, Januška. Smernice za zbiranje podatkov za govorne vire. Maribor: Univerza, Fakulteta za elektrotehniko, računalništvo in informatiko, 2024. 31 str., Digitalna knjižnica Univerze v Mariboru – DKUM. [COBISS.SI-ID 191313155]

Corpora and research data

In the MEZZANINE project we have helped upgrading the following language resources:

VERDONIK, Darinka, ZWITTER VITEZ, Ana, ZEMLJARIČ MIKLAVČIČ, Jana, KREK, Simon, STABEJ, Marko, ERJAVEC, Tomaž, POTOČNIK, Tomaž, SEPESY MAUČEC, Mirjam, MAJHENIČ, Simona, ŽGANK, Andrej, BIZJAK, Andreja, GRIL, Lucija, DOBRIŠEK, Simon, KRIŽAJ, Janez, BAJEC, Marko, LEBAR BAJEC, Iztok, ŠOLTES, Tjaša, TROJAR, Mitja, BERNJAK, Mitja, DRETNIK, Naum, STRLE, Gregor, DOBROVOLJC, Kaja, LJUBEŠIĆ, Nikola, RUPNIK, Peter, et al. Spoken corpus Gos 2.1 (transcriptions). Ljubljana: Centre for Language Resources and Technologies, University of Ljubljana … [etc.]: IICT-BAS, 2023. CLARIN.SI data & tools. ISSN 2820-4042. [COBISS.SI-ID 177487107]

KUZMAN, Taja, LJUBEŠIĆ, Nikola, ERJAVEC, Tomaž, FIŠER, Darja, MEDEN, Katja, PANČUR, Andrej, OJSTERŠEK, Mihael, RUPNIK, Peter, KRYVENKO, Anna, SKUBIC, Jure, et al. Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en.ana 4.0. Ljubljana: Institut Jožef Stefan, 2023. CLARIN.SI data & tools. ISSN 2820-4042. [COBISS.SI-ID 173570307]

TERČON, Luka, LJUBEŠIĆ, Nikola, ERJAVEC, Tomaž. Word embeddings 2.0. Ljubljana: Institut Jožef Stefan, 2023. CLARIN.SI data & tools. ISSN 2820-4042. [COBISS.SI-ID 161108739]