Work Packages

The MEZZANINE project is divided into 4 work packages, each containing 2 to 4 activities. Research in each activity follows the defined research questions. Experts from linguistics and technical sciences will cooperate in each work package.

WP1: Acquiring recordings of speech


Research questions

A1.1-I – Spoken language resources in Linguistics and Technical Sciences

What are the needs of different linguistic disciplines and technical sciences regarding the spoken language resources?

How well are the existing reference speech corpora balanced with regard to the covered spoken genres?

A1.2-I Advantages and disadvantages of different recording techniques

What recording techniques are used to collect speech data, and what are the characteristics of data collected with particular techniques?

What are the potentials of crowdsourcing speech data in small communities, and how can it satisfy the needs of a diverse set of disciplines? 

What are the legal considerations when recording speech data or using the existing speech data from different sources and how to address them?

A1.3-T Low-cost limited domain speech data for training a speech recogniser

How should an unsupervised or semi-supervised training of a speech recogniser be constructed, if speech data are only available for a specific domain?

What is the optimal approach for constructing new speech data from the perspective of available low-cost speech data?

A1.4-T The effectiveness of knowledge transfer for different speech/speaker recognition tasks

What are the speech recognition tasks with the lowest possibility of knowledge transfer from high-resourced languages to Slovenian?

WP2: Dialect variation


Research questions

A2.1-L Geolinguistic analysis of non-standard phonemes

How reliable is the actual version of Slovenian dialect phonetic transcription?

A2.2-L A spatial model of basic dialect areas of non-standard phonemes

How to determine a spatial distribution of non-standard phonemes?

A2.3-L Creation of diasystemic contrastive tables

How to create a spatial model for designing diasystemic contrastive Tables of phonemes (dialect vs. standard)?

A2.4-I Definition of an optimal Slovenian phoneme set for ASR

How to define an optimal Slovenian phoneme set, which is balanced between the standardised version and dialectic phoneme version?

WP3: Speech segmentation and annotation


Research questions

A3.1-I – The basic units of speech

How well do manually annotated speech segments (i.e., utterances) in the Slovenian spoken language resources correlate with prosodic units?

How well do manually annotated speech segments in the Slovenian spoken language resources correlate with syntactic units?

A3.2-I Annotating and modelling disfluencies

What is an appropriate scheme for annotation of disfluencies in speech corpora?

What is the optimal approach to automatic disfluency detection in speech corpora?

A3.3-I Morphosyntactic annotation, lemmatisation and dependency parsing

How can disfluency annotations inform and improve linguistic annotation?

How can training data from other domains and modalities be used efficiently for spoken language processing?

What is the impact of linguistic input representation on the results of linguistic annotation?

A3.4-I  Dialogue acts` annotation

How unambiguous, adequate and informative is the GORDAN scheme compared to the ISO 24617-2 Standard?

How to expand the ISO 24617-2 tagset in order to achieve better adequacy and informativeness of the tagset?

WP4: Spoken lexis


Research questions

A4.1-I Canonical forms of (non-standard) spoken lexis

Which types (distinct words in a corpus) interpreting the same or similar phenomena were standardised differently in existing spoken language resources?

What is the appropriate categorisation of the analysed heterogeneously interpreted corpus types, and how are canonical forms classified according to different categories (of types)?

How are canonical forms and types included in the lexicon, or linked with lexicon data?

A4.2-I Lexicographic description of (non-standard) spoken language

What are the characteristics of the spoken lexis, as opposed to written language, and how can these characteristics be analysed automatically (for lexicographic purposes)?

How is semantic description spoken language lexis included in semantic (lexicographic) resources for Slovenian?