multimodal oral corpora administration (v.3)

[moca] is an online sytem for the administration of spoken language corpora. Audio and/or video recordings and the accompanying transcription files are stored in [moca]. Transcription files are aligned, providing speaker information and the temporal blueprint of the transcription in addition to the transcription itself. This allows for accessing the media file at individual points in a transcription file directly through an internet browser. In addition to transcript administration, [moca] allows for the structured administration of sociolinguistic metadata, including information about the setting of a recording and about the individual speakers within the recording. In addition, manual tagging with so-called labels allows for the collection and detailed analysis of linguistic phenomena.

Detailed search routines enable fine-grained searches for individual recordings, speakers, transcript excerpts and labels. Searches can be limited, for example, to recordings and transcripts from certain regions or certain speaker age-groups. In addition, transcripts can be searched for intonation phrases that contain certain (combinations or parts of) word forms.

[moca] aims to provide intuitive, safe and personalized access to spoken language corpora. The system supports a theoretically unlimited number of users whose access to the corpus can be restricted and/or adapted to their individual needs. [moca ] can be used from any web-enabled computer and does not require any additional software or programming skills.

Virtual Installations

Corpus Andes

Autores: P. Dankel (U. Basel), J. Godenzzi (U. Montreal), M. Haboud (U. Quito) A. Martinez y G. Bravo (U. La Plata), A. Palacios (U. Madrid), I. Satti, M. Soto y S. Pfänder (U. Freiburg, responsables)


Corpus International Écologique de la Langue Française

CIEL-F est un corpus de français oral en interaction recueilli dans l’espace francophone. Il est constitué d’extraits d’environ 200 enregistrements de 10 minutes, collectés de 2006 à 2012 dans 15 zones à travers le monde.

Ce projet est géré par cinq équipes universitaires, pilotées par cinq professeurs : Lorenza Mondada (Lyon-2), Françoise Gadet (Paris-Ouest), Stefan Pfänder (Freiburg), Ralph Ludwig (Halle) et Anne-Catherine Simon (Louvain-la-Neuve). Chacun est responsable d’un groupe de zones. Les enregistrements de CIEL-F relèvent de quatre catégories : interactions lors de repas (code REP), émissions de radio (RAD), interactions dans un cadre professionnel (PRO) et autres (AUT : conversations entre amis, échanges commerciaux, soutenance de mémoire universitaire, etc.). Chacun est anonymisé. Sur cette page sont disponibles, pour chaque enregistrement, les données sonores (et visuelles), accompagnées de leur transcriptions.


How to cite:
Gadet, F. & Ludwig, R. & Mondada, L. & Pfänder, S. & Simon, A.C. (2019). CIEL_F: Corpus International Ecologique de la langue francaise; Supported by ANR & DFG. Available online at

Ressources disponibles sous licence CC BY-NC-SA 4.0

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


Moca3 - Demo installation

Freiburg Sofa Talks

The Freiburg Sofa Talks Corpus (main author: S. Pfänder, coordination: I. Satti & E. Schumann) comprises 262 video recordings ranging from 10 to 50 minutes in duration. The corpus encompasses Italian, Spanish, French, Portuguese, German, and Russian data, both from European and non-European countries and varieties.

In each video, a couple, two family members, or two close friends jointly recount events they have experienced together. The participants are mostly recorded in their home, where they feel comfortable sharing memories and stories with a third person (i.e., the addressed recipient). This person is the researcher, i.e., the person tasked with bringing the recording equipment to the participants’ home and informing them of the project. Importantly, the researcher is not a stranger to the couple but they are always either friends with the couple or related to them. The main camera is positioned next to the researcher, facing the other two participants. In some recordings, a second camera is set up in a way that captures the third person.

This conversational setting was designed to provide authentic opportunity spaces for range of narrative practices deployed to make shared experiences accessible for both a co-present recipient and an imagined audience. The participants choose freely what they talk about and how they do so. They self-manage the process of turn-taking, choosing topics and assigning participation roles. Moreover, they freely decide how and to what extent they address the third person, which leads to dynamic changes in the participation framework. As a result, the recordings encompass both storytelling activities and other forms of talk.

This corpus thus provides a large set of comparable videodata with complete visual access to the gestures and gaze of the participants on the sofa. The similar setup of the recordings across languages provides the basis for cross-linguistic studies and the size of the corpus allows for both qualitative and quantitative investigations of multimodal storytelling practices.

All participants have signed informed consent forms before producing the material.

Authors: D. Alcón (Freiburg; European Spanish data, corpus administration), F. D’Antoni (Leuven; Italian data), D. Dressel (Freiburg; French data), M. Garachana (Barcelona; Barcelona Spanish & Catalan data), M. Klatt (Freiburg, French data), S. Pfänder (Freiburg; project supervisor), I. Satti (Freiburg; American Spanish data, coordination), E. Schumann (Freiburg; German & French data, student accounts).

Corpus Salcedo

Corpus Salcedo by Muysken

Autor: Pieter Muysken

El corpus aquí presentado fué grabado en audio con material de grabación bastante simple en la época de mis estudios de doctorado y post-doctorado en la Universidad de Amsterdam (1974-1979). La grabación tuvo lugar en el lindo cantón San Miguel de Salcedo y sus alrededores. Fué un gran placer para mi llegar a conocer a personas de todas las capas sociales de la zona, desde los cargadores migrantes de Tigua hasta la clase terrateniente local y los comerciantes de la plaza. Sus voces están presentes en el corpus. En ese entonces no tenía mucho dinero de investigación ni mucha experiencia, y por eso la calidad de las grabaciones no es tan buena. Sin embargo, el material representa una gran variabilidad de hablas, desde el quichua de las zonas rurales más altas en el páramo hasta la media lengua de las zonas semi-urbanas alrededor del centro cantonal y las diferentes variedades del castellano de la plaza, las tiendas, y los hogares del centro más prestigioso y los barrios más humildes. Agradezco al pueblo del cantón Salcedo, runas y blancos, por recibirme y participar en este proyecto.

Transcripción – Edición – Banco de Datos:
Ana Franco y Agustín Jerez (Cotopaxi y Tungurahua)
Patricia Menges, Susana Menéndez y Hella Olbertz (Amsterdam)
Philipp Dankel (responsable), Daniel Alcón (base de datos), Mario Soto (esp. andino) y Stefan Pfänder (coordinación) (equipo Basel & Freiburg)
Marleen Haboud (responsable), Elizabeth Rosero y Ernesto Farinango (equipo Quito)

Apoyado por:
Universidades de Amsterdam, Leiden, Nimega (Países Bajos), Fundación Neerlandesa de Investigaciones Científicas (NWO), German Science Foundation Research Training Group 1624, Ulderup Foundation, Romanisches Seminar Freiburg


Cómo citar este corpus:
Muysken, Pieter (2020): Corpus de Salcedo; editado por Philipp Dankel, Marleen Haboud, Hella Olbertz & Stefan Pfänder (coord.), acceso libre via MOCA (por Daniel Alcón)

