moca3

multimodal oral corpora administration (v.3)

[moca] is an online sytem for the administration of spoken language corpora. Audio and/or video recordings and the accompanying transcription files are stored in [moca]. Transcription files are aligned, providing speaker information and the temporal blueprint of the transcription in addition to the transcription itself. This allows for accessing the media file at individual points in a transcription file directly through an internet browser. In addition to transcript administration, [moca] allows for the structured administration of sociolinguistic metadata, including information about the setting of a recording and about the individual speakers within the recording. In addition, manual tagging with so-called labels allows for the collection and detailed analysis of linguistic phenomena.

Detailed search routines enable fine-grained searches for individual recordings, speakers, transcript excerpts and labels. Searches can be limited, for example, to recordings and transcripts from certain regions or certain speaker age-groups. In addition, transcripts can be searched for intonation phrases that contain certain (combinations or parts of) word forms.

[moca] aims to provide intuitive, safe and personalized access to spoken language corpora. The system supports a theoretically unlimited number of users whose access to the corpus can be restricted and/or adapted to their individual needs. [moca ] can be used from any web-enabled computer and does not require any additional software or programming skills.

Virtual Installations

List of all Virtual Installations available in your System. Please click one to login.

My Virtual Installation

Predefined VI for a new moca3 system


Corpus Andes

Autores: P. Dankel (U. Basel), J. Godenzzi (U. Montreal), M. Haboud (U. Quito) A. Martinez y G. Bravo (U. La Plata), A. Palacios (U. Madrid), I. Satti, M. Soto y S. Pfänder (U. Freiburg, responsables)


Ciel-F

Corpus International Écologique de la Langue Française

CIEL-F est un corpus de français oral en interaction recueilli dans l’espace francophone. Il est constitué d’extraits d’environ 200 enregistrements de 10 minutes, collectés de 2006 à 2012 dans 15 zones à travers le monde. Ce projet est géré par cinq équipes universitaires, pilotées par cinq professeurs : Lorenza Mondada (Lyon-2), Françoise Gadet (Paris-Ouest), Stefan Pfänder (Freiburg), Ralph Ludwig (Halle) et Anne-Catherine Simon (Louvain-la-Neuve). Chacun est responsable d’un groupe de zones . Les enregistrements de CIEL-F relèvent de quatre catégories : interactions lors de repas (code REP), émissions de radio (RAD), interactions dans un cadre professionnel (PRO) et autres (AUT : conversations entre amis, échanges commerciaux, soutenance de mémoire universitaire, etc.). Chacun est anonymisé. Sur cette page sont disponibles, pour chaque enregistrement, les données sonores (et visuelles), accompagnées de leur transcriptions.


Demo

Moca3 - Demo installation

[moca] is an online sytem for the administration of spoken language corpora. Audio and/or video recordings and the accompanying transcription files are stored in [moca]. Transcription files are aligned, providing speaker information and the temporal blueprint of the transcription in addition to the transcription itself. This allows for accessing the media file at individual points in a transcription file directly through an internet browser.


Freiburg Sofa Talks

The Freiburg Sofa Talks Corpus (main author: Stefan Pfänder, coordination: I. Satti & E. Schumann) comprises 168 video recordings ranging in duration from 10 to 40 minutes. In each recording, two people collaboratively reconstruct their shared experiences while sitting on a sofa together. The corpus contains material from four European languages (mainly Italian, Spanish, French and German). The protagonists in each of the recordings have known each other for quite some time: they may be close friends, siblings or married couples. They were instructed to jointly recall things they have experienced together in the past. Shared experience of the narrated events in question was critical for inclusion in the corpus. Before each recording starts, the two participants are explicitly asked to tell their stories together, which is intended to guarantee equal epistemic authority as well as an equal right to speak. The two protagonists are free to choose the episodes they will talk about beforehand. The recording takes place in the presence of a third person -- a close friend, another sibling, or a neighbour – who, however, does not actively intervene in the reminiscing. This has two implications. First, the two participants on the sofa self-manage the process of choosing their topics and of assigning speaker roles. Second, the recipient is not exclusively the partner on the sofa, since the third person, who is sitting next to a fixed camera, may constitute a further possible addressee.

Authors: D. Alcon (European Spanish Data, Data base administration), D’Antoni (U. Freiburg, Italien data), D. Dressel (Canadian & African French & English Data), M. Garachana (U. Barcelona, Barcelona Spanish & Catalan Data), S. Pfänder (U. Freiburg & DFG GRK Frequency Effects, Responsible)I. Satti (U. Freiburg, American Spanish Data & Coordination), E. Schumann (U. Freiburg, German, Russian & European French Data, Student Accounts)


Corpus Salcedo

Autor: Pieter Muysken

El corpus aquí presentado fué grabado en audio con material de grabación bastante simple en la época de mis estudios de doctorado y post-doctorado en la Universidad de Amsterdam (1974-1979). La grabación tuvo lugar en el lindo cantón San Miguel de Salcedo y sus alrededores. Fué un gran placer para mi llegar a conocer a personas de todas las capas sociales de la zona, desde los cargadores migrantes de Tigua hasta la clase terrateniente local y los comerciantes de la plaza. Sus voces están presentes en el corpus. En ese entonces no tenía mucho dinero de investigación ni mucha experiencia, y por eso la calidad de las grabaciones no es tan buena. Sin embargo, el material representa una gran variabilidad de hablas, desde el quichua de las zonas rurales más altas en el páramo hasta la media lengua de las zonas semi-urbanas alrededor del centro cantonal y las diferentes variedades del castellano de la plaza, las tiendas, y los hogares del centro más prestigioso y los barrios más humildes. Agradezco al pueblo del cantón Salcedo, runas y blancos, por recibirme y participar en este proyecto.

Transcripción – Edición – Banco de Datos:
Ana Franco y Agustín Jerez (Cotopaxi y Tungurahua)
Patricia Menges, Susana Menéndez y Hella Olbertz (Amsterdam)
Philipp Dankel (responsable), Daniel Alcón (base de datos), Mario Soto (esp. andino) y Stefan Pfänder (coordinación) (equipo Basel & Freiburg)
Marleen Haboud (responsable), Elizabeth Rosero y Ernesto Farinango (equipo Quito)

Apoyado por:
Universidades de Amsterdam, Leiden, Nimega (Países Bajos), Fundación Neerlandesa de Investigaciones Científicas (NWO), German Science Foundation Research Training Group 1624, Ulderup Foundation, Romanisches Seminar Freiburg



Global Administration