Results

The planned results of the project MEZZANINE include original scientific publications, technical reports/guidelines, applications, datasets and corpora.

Events

The 1st Internal MEZZANINE Workshop

Venue & date: ZRC SAZU, Ljubljana, April 14th, 2023, 9:00 AM till 1:00 PM

Participants: Project MEZZANINE researchers

Contents: The Slovenian Linguistic Atlas as a data source of the locational distribution displaying the Slovenian non-standard phones, novelties of the chapter Glasoslovni oris v Pravopisu 8.0. (An outline of Phonetics in Orthography 8.0), Prosody analyses in Praat, Sloleks and spoken Slovenian, Automatic speech recognition, Annotation of spoken language resources.

The 6th International Scientific Conference Slavic Scientific Considerations: Infrastructure for Spoken Language Research in Humanities and Language Technologies

Venue & date: Faculty of Arts, University of Maribor, May 18th-19th, 2023

Participants: Invited lecturers Mojca Smolej, Vesna Mikolič, Radovan Garabik, Peter Jurgec, project MEZZANINE researchers, and other researchers of the Slovenian and South Slavic languages

Content: See the book of abstracts (https://mezzanine.um.si/konference/6-mednarodna-znanstvena-konferenca-slavisticni-znanstveni-premisleki/#zbornik)

SemDial – MariLogue, The 27th Workshop on the Semantics and Pragmatics of Dialogue

Venue & date: Faculty of Electrical Engineering and Computer Science, University of Maribor, August 16th-17th, 2023

Participants: Researchers in the semantics and pragmatics of dialogue from everywhere

Content: See the book of abstracts (https://mezzanine.um.si/en/conference/semdial-2023-marilogue/#proceedings)

The 2nd Internal MEZZANINE Workshop

Venue & date: Faculty of Computer and Information Science, University of Ljubljana, September 9th, 9:30 AM till 2:00 PM

Participants: Project MEZZANINE researchers

Content: Spoken language resources Artur, Gos 2.0, and the spoken Slovene training corpus; lemmatization and annotation of MSD in spoken language resources; segmentation of spoken language into basic units; syntactic annotations of spoken language; phoneme and word segmentation processes for prosodic analysis

The 3rd Internal MEZZANINE Workshop

Venue & date: Faculty of Computer and Information Science, University of Ljubljana, February 13th, 9:00 AM till 2:00 PM

Participants: Project MEZZANINE researchers

Content: disfluency annotation in the corpus Iriss with the tool Exmaralda; using automatic segmentation tools, speaker recognition, and speech recognition tools for semi-automatic speech transcription; ParlaSpeech; corpus tags for the description of speech event context; the phonetic module of the geolinguistic application DIAtlas; the automatic segmentation of the corpus Gos 2.1 and the use of machine acoustic measurements

Original scientific publications

1.01 Original scientific article

VERDONIK, Darinka. Primarne kategorije dialoških dejanj. Slavistična revija : časopis za jezikoslovje in literarne vede. [Tiskana izd.]. 2023, letn. 71, št. 1, str. 43-60. ISSN 0350-6894. https://srl.si/ojs/srl/article/view/4062, DOI: 10.57589/srl.v71i1.4062. [COBISS.SI-ID 151811075], [SNIP, Scopus]

1.04 Professional article

TERČON, Luka, LJUBEŠIĆ, Nikola. CLASSLA-Stanza : the next step for linguistic processing of South Slavic languages. ArXiv.org. [in press] 2023, eprint 2308.04255. ISSN 2331-8422. https://arxiv.org/abs/2308.04255, DOI: 10.48550/arXiv.2308.04255. [COBISS.SI-ID 187571459]

1.08 Published scientific conference contribution

VERDONIK, Darinka, BIZJAK, Andreja, ŽGANK, Andrej, DOBRIŠEK, Simon. Metapodatki o posnetkih in govorcih v govornih virih: primer baze Artur. V: FIŠER, Darja (ur.), ERJAVEC, Tomaž (ur.). Jezikovne tehnologije in digitalna humanistika : zbornik konference : 15.-16. september 2022, Ljubljana, Slovenija = Proceedings of the Conference on Language Technologies and Digital Humanities : September 15th-16th 2022, Ljubljana, Slovenia. 1st ed. Ljubljana: Inštitut za novejšo zgodovino: = Institute of Contemporary History, 2022. Str. 205-212. ISBN 978-961-7104-20-2. https://nl.ijs.si/jtdh22/pdf/JTDH2022_Proceedings.pdf. [COBISS.SI-ID 124488451]

AEPLI, Noëmi, ÇÖLTEKIN, Çagrı, LJUBEŠIĆ, Nikola, ZAMPIERI, Marcos, et al. Findings of the VarDial Evaluation Campaign 2023. V: SCHERRER, Yves (ur.), et al. The Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) : proceedings of the workshop : [Dubrovnik], May 5, 2023. Stroudsburg: Association for Computational Linguistics, cop. 2023. Str. 251-261, tabele. ISBN 978-1-959429-50-0. https://aclanthology.org/2023.vardial-1.25.pdf, DOI: 10.18653/v1/2023.vardial-1.25. [COBISS.SI-ID 173399299]

KUZMAN, Taja, RUPNIK, Peter, LJUBEŠIĆ, Nikola. Get to know your parallel data : performing English variety and genre classification over MaCoCu Corpora. V: SCHERRER, Yves (ur.), et al. The Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) : proceedings of the workshop : [Dubrovnik], May 5, 2023. Stroudsburg: Association for Computational Linguistics, cop. 2023. Str. 91-103, ilustr. ISBN 978-1-959429-50-0. https://aclanthology.org/2023.vardial-1.9.pdf, DOI: 10.18653/v1/2023.vardial-1.9. [COBISS.SI-ID 173393411]

OGRODNICZUK, Maciej, OSENOVA, Petja, ERJAVEC, Tomaž, FIŠER, Darja, LJUBEŠIĆ, Nikola, ÇÖLTEKIN, Çagrı, KOPP, Matyáš, MEDEN, Katja, KUZMAN, Taja. The ParlaMint Project : ever-growing family of comparable and interoperable parliamentary corpora. V: LINDÉN, Krister (ur.), NIEMI, Jyrki (ur.), KONTINO, Thalassia (ur.). CLARIN annual conference proceedings 2023 : 16 – 18 October 2023 Leuven, Belgium. [S. l.: s. n.], 2023. Str. 62-66, ilustr. CLARIN Annual Conference Proceedings. ISSN 2773-2177. https://office.clarin.eu/v/CE-2023-2328_CLARIN2023_ConferenceProceedings.pdf. [COBISS.SI-ID 169470211]

BAJEC, Marko, LEBAR BAJEC, Iztok, ŠOLTES, Tjaša, CVEK, Jernej, ČIBEJ, Jaka, GANTAR, Kaja, SEVER, Sara, KREK, Simon. Online Notes – a real-time speech recognition and machine translation system for Slovene university lectures. V: DEBEVC, Matjaž (ur.), KOŽUH, Ines (ur.). Digitalna vključenost v informacijski družbi = Digital Inclusion in Information Society : Informacijska družba – IS 2023 = Information Society – IS 2023 : zbornik 26. mednarodne multikonference = proceedings of the 26th International Multiconference : zvezek H = volume H : 11. oktober 2023, 11 October 2023, Ljubljana, Slovenia. Ljubljana: Institut “Jožef Stefan”, 2023. Str. 7-10, ilustr. Informacijska družba. ISBN 978-961-264-280-8. ISSN 2630-371X. https://is.ijs.si/wp-content/uploads/2023/11/IS2023_Volume-H.pdf. [COBISS.SI-ID 172807683]

ŠOLTES, Tjaša, BAJEC, Marko, LEBAR BAJEC, Iztok, GANTAR, Kaja, ŽITNIK, Slavko. Online-notes system : real-time speech recognition and translation of lectures. V: NURCAN, Selmin (ur.). Research challenges in information science : information science and the connected world : 17th International Conference, RCIS 2023, Corfu, Greece, May 23–26, 2023 : proceedings. Cham: Springer, cop. 2023. Str. 485-492, ilustr. Lecture notes in business information processing (Internet), 476. ISBN 978-3-031-33080-3. ISSN 1865-1356. https://link.springer.com/chapter/10.1007/978-3-031-33080-3_29, DOI: 10.1007/978-3-031-33080-3_29. [COBISS.SI-ID 157601539], [SNIP, Scopus]

1.12 Published scientific conference contribution abstract

KRAJNC IVIČ, Mira, ANTLOGA, Špela. Predlog izdelave korpusa humorja v govoru za slovenščino = Spoken Slovene corpus of humor : draft proposal. V: KRAJNC IVIČ, Mira (ur.). Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov : [6. mednarodna znanstvena konferenca Slavistični znanstveni premisleki : 18. 5.-19. 5. 2023, Maribor, Slovenija]. 1. izd. Maribor: Univerza v Mariboru, Univerzitetna založba, 2023. Str. 69-72. ISBN 978-961-286-735-5. https://press.um.si/index.php/ump/catalog/book/774. [COBISS.SI-ID 165263619]

ŠUMENJAK, Klara. Standardi transkribiranja narečnega korpusa GOKO = GOKO dialect corpus transcription standards. V: KRAJNC IVIČ, Mira (ur.). Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov : [6. mednarodna znanstvena konferenca Slavistični znanstveni premisleki : 18. 5.-19. 5. 2023, Maribor, Slovenija]. 1. izd. Maribor: Univerza v Mariboru, Univerzitetna založba, 2023. Str. 105-109. ISBN 978-961-286-735-5. https://press.um.si/index.php/ump/catalog/book/774. [COBISS.SI-ID 176452867]

VERDONIK, Darinka, TROJAR, Mitja, BIZJAK, Andreja. Prednosti in slabosti dvotirnega zapisovanja govora v slovenskih govornih virih = Advantages and Disadvantages of Two-level Speech Transcription in the Slovenian Speech Resources. V: KRAJNC IVIČ, Mira (ur.). Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov : [6. mednarodna znanstvena konferenca Slavistični znanstveni premisleki : 18. 5.-19. 5. 2023, Maribor, Slovenija]. 1. izd. Maribor: Univerza v Mariboru, Univerzitetna založba, 2023. Str. 111-114. ISBN 978-961-286-735-5. https://press.um.si/index.php/ump/catalog/book/774. [COBISS.SI-ID 158884355]

VERDONIK, Darinka, MAJHENIČ, Simona, BIZJAK, Andreja. Are metadiscourse dialogue acts a category on their own?. V: LÜCKING, Andy (ur.), MAZZOCCONI, Chiara (ur.), VERDONIK, Darinka (ur.). SemDial 2023 : MariLogue : proceedings of the 27th Workshop on the Semantics and Pragmatics of Dialogue : held at University of Maribor, Faculty of Electrical Engineering and Computer Science, the Internet, August 16–17 2023. Maribor: University of Maribor, Faculty of Electrical Engineering and Computer Science, 2023. Str. 178-180. Proceedings (SemDial). ISSN 2308-2275. https://mezzanine.um.si/wp-content/uploads/Marilogue_Proceedings1.pdf. [COBISS.SI-ID 173321219]

1.16 Independent scientific component part or a chapter in a monograph

VERDONIK, Darinka. Zbiranje gradiv za govorne korpuse med Scilo in Karibdo. V: ARHAR HOLDT, Špela (ur.), KREK, Simon (ur.). Razvoj slovenščine v digitalnem okolju. 1. izd. Ljubljana: Založba Univerze, 2023. Str. 15-37, ilustr. Sporazumevanje. ISBN 978-961-297-256-1. ISSN 2738-4527. https://ebooks.uni-lj.si/ZalozbaUL/catalog/view/522/852/9447. [COBISS.SI-ID 185550083]

Professional publications and secondary authorship

1.09 Published professional conference contribution

8. MAJHENIČ, Simona. No, tudi z diskurznimi označevalci lahko tolmači veliko povemo : Pomen tolmačenja diskurznih označevalcev. V: ZIDAR FORTE, Jana (ur.). Odvrženi plašč nevidnosti : jubilejni zbornik ob 50-letnici ZKTS. Ljubljana: Združenje konferenčnih tolmačev Slovenije, 2023. Str. 69-75. ISBN 978-961-96113-0-2. http://zkts.si/images/Zbornik_ZKTS50.pdf. [COBISS.SI-ID 138132483]
financer: ARRS, Projekt, J7-4642, SI, MEZZANINE – teMeljnE raZiskave Za rAzvoj govorNih vIrov in tehNologij za slovEnščino

1.20 Preface, editorial, afterword

9. KRAJNC IVIČ, Mira. Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov = Infrastructure for speech research in the humanities and language technologies : book of abstracts. V: KRAJNC IVIČ, Mira (ur.). Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov : [6. mednarodna znanstvena konferenca Slavistični znanstveni premisleki : 18. 5.-19. 5. 2023, Maribor, Slovenija]. 1. izd. Maribor: Univerza v Mariboru, Univerzitetna založba, 2023. Str. 137-138. ISBN 978-961-286-735-5. https://press.um.si/index.php/ump/catalog/book/774. [COBISS.SI-ID 165772035]

3.15 Unpublished conference contribution

29. MAJHENIČ, Simona. Cognitive discourse markers in simultaneous interpreting : predavanje na konferenco na Université Paris Cité z naslovom “Discourse Markers – Theories and Methods”, Pariz, Francija, 25. 5. 2023. [COBISS.SI-ID 174052355]
financer: ARRS, Projekt, J7-4642, SI, MEZZANINE – teMeljnE raZiskave Za rAzvoj govorNih vIrov in tehNologij za slovEnščino

Editor

KRAJNC IVIČ, Mira (urednik). Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah : zbornik povzetkov : [6. mednarodna znanstvena konferenca Slavistični znanstveni premisleki : 18. 5.-19. 5. 2023, Maribor, Slovenija]. 1. izd. Maribor: Univerza v Mariboru, Univerzitetna založba, 2023. 1 spletni vir (1 datoteka PDF (IV, 136 str.). ISBN 978-961-286-735-5. https://press.um.si/index.php/ump/catalog/book/774, https://dk.um.si/IzpisGradiva.php?id=84288, http://www.dlib.si/details/URN:NBN:SI:DOC-SG85YGU5, DOI: 10.18690/um.ff.5.2023. [COBISS.SI-ID 150988291]

LÜCKING, Andy (urednik), MAZZOCCONI, Chiara (urednik), VERDONIK, Darinka (urednik). SemDial 2023 : MariLogue : proceedings of the 27th Workshop on the Semantics and Pragmatics of Dialogue : held at University of Maribor, Faculty of Electrical Engineering and Computer Science, the Internet, August 16–17 2023. Maribor: University of Maribor, Faculty of Electrical Engineering and Computer Science, 2023. 1 spletni vir (1 datoteka PDF (VIII, 180 str.)), ilustr. Proceedings (SemDial). ISSN 2308-2275. https://mezzanine.um.si/wp-content/uploads/Marilogue_Proceedings1.pdf. [COBISS.SI-ID 167897859]

Studies

VERDONIK, Darinka. Označevanje netekočnosti v govoru: primer označevanja z uporabo orodja Exmaralda. Maribor: Univerza, Fakulteta za elektrotehniko, računalništvo in informatiko, 2024. 24 str., pril. https://mezzanine.um.si/rezultati/, https://dk.um.si/IzpisGradiva.php?id=87952. [COBISS.SI-ID 191164931]

VERDONIK, Darinka, GOSTENČNIK, Januška. Smernice za zbiranje podatkov za govorne vire. Maribor: Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko. 2024.

Corpora and research data

In the MEZZANINE project we have helped upgrading the following language resources:

VERDONIK, Darinka, ZWITTER VITEZ, Ana, ZEMLJARIČ MIKLAVČIČ, Jana, KREK, Simon, STABEJ, Marko, ERJAVEC, Tomaž, POTOČNIK, Tomaž, SEPESY MAUČEC, Mirjam, MAJHENIČ, Simona, ŽGANK, Andrej, BIZJAK, Andreja, GRIL, Lucija, DOBRIŠEK, Simon, KRIŽAJ, Janez, BAJEC, Marko, LEBAR BAJEC, Iztok, ŠOLTES, Tjaša, TROJAR, Mitja, BERNJAK, Mitja, DRETNIK, Naum, STRLE, Gregor, DOBROVOLJC, Kaja, LJUBEŠIĆ, Nikola, RUPNIK, Peter, et al. Spoken corpus Gos 2.1 (transcriptions). Ljubljana: Centre for Language Resources and Technologies, University of Ljubljana … [etc.]: IICT-BAS, 2023. CLARIN.SI data & tools. ISSN 2820-4042. http://hdl.handle.net/11356/1863. [COBISS.SI-ID 177487107]

KUZMAN, Taja, LJUBEŠIĆ, Nikola, ERJAVEC, Tomaž, FIŠER, Darja, MEDEN, Katja, PANČUR, Andrej, OJSTERŠEK, Mihael, RUPNIK, Peter, KRYVENKO, Anna, SKUBIC, Jure, et al. Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en.ana 4.0. Ljubljana: Institut Jožef Stefan, 2023. CLARIN.SI data & tools. ISSN 2820-4042. http://hdl.handle.net/11356/1864. [COBISS.SI-ID 173570307]

TERČON, Luka, LJUBEŠIĆ, Nikola, ERJAVEC, Tomaž. Word embeddings CLARIN.SI-embed.sl 2.0. Ljubljana: Institut Jožef Stefan, 2023. CLARIN.SI data & tools. ISSN 2820-4042. http://hdl.handle.net/11356/1791. [COBISS.SI-ID 161108739]