[moca]

Multimodal Oral Corpora Administration

Version 3 is comming in summer 2013 as a downloadable and installable open source solution fully documented.

Please contact Daniel Alcón for more information

[moca] is an online sytem for the administration of spoken language corpora. Audio and/or video recordings and the accompanying transcription files are stored in [moca]. Transcription files are aligned, providing speaker information and the temporal blueprint of the transcription in addition to the transcription itself. This allows for accessing the media file at individual points in a transcription file directly through an internet browser. In addition to transcript administration, [moca] allows for the structured administration of sociolinguistic metadata, including information about the setting of a recording and about the individual speakers within the recording. In addition, manual tagging with so-called labels allows for the collection and detailed analysis of linguistic phenomena.

Detailed search routines enable fine-grained searches for individual recordings, speakers, transcript excerpts and labels. Searches can be limited, for example, to recordings and transcripts from certain regions or certain speaker age-groups. In addition, transcripts can be searched for intonation phrases that contain certain (combinations or parts of) word forms.

[moca] aims to provide intuitive, safe and personalized access to spoken language corpora. The system supports a theoretically unlimited number of users whose access to the corpus can be restricted and/or adapted to their individual needs. [moca ] can be used from any web-enabled computer and does not require any additional software or programming skills.

[moca] ist ein Online-System zur Verwaltung mündlicher Sprachkorpora. In [moca] werden Audio- und/oder Videoaufnahmen sowie zugehörige Transkripte gespeichert. Die Transkripte liegen in alignierter Form vor, was bedeutet, dass mit dem Text der Sprechbeiträge auch die Sprecher- und Zeitinformation erfasst wird. Hierdurch ist es möglich, in einem Internetbrowser direkt die entsprechende Aufnahme zu einer Transkriptstelle als Mediastream abzuspielen. Neben den Transkripten können auch soziolinguistische Metainformationen zur Aufnahmesituation und den beteiligten Sprechern strukturiert verwaltet werden. Über die Vergabe sogenannter Labels für Äußerungen (manuelles Tagging) können umfangreiche Kollektionen eines linguistischen Phänomens erstellt und ausgewertet werden.

Detaillierte Suchmöglichkeiten erlauben es, bestimmte Aufnahmen, Sprecher, Transkriptausschnitte und Labels zu finden. Beispielsweise ist es möglich, aus den vorhandenen Daten Aufnahmen aus einer bestimmten Region auszuwählen, um Analysen darauf zu beschränken, oder nach Sprechern zu suchen, die einer bestimmten Altersgruppe angehören. Darüber hinaus ist es möglich, in Transkripten nach Intonationsphrasen zu suchen, die bestimmte (Kombinationen oder Teile von) Wortformen enthalten.

Ziel von [moca] ist dabei, einen intuitiven, sicheren und personifizierten Zugang zu den Korpora zu gewährleisten. Dabei unterstützt das System eine unbegrenzte Anzahl von Nutzern, denen individuell der Zugriff auf bestimme Daten gestattet oder verweigert werden kann. [moca] kann von praktisch jedem internetfähigen Computer genutzt werden, ohne dass besondere technische Anforderungen oder Kenntnisse erforderlich sind.