Gathering language materials and linguistic data through work with speech communities
Corpora are collections of written text and transcribed speech compiled during language documentation projects and research. Proper organisation and archiving of corpora improve data collection, academic integrity, preservation efforts and — importantly — accessibility, so that materials produced by CoEDL members are available to the speakers and communities, including those that assisted this research.
Corpus collection and management were critical research priorities for CoEDL and central to the Centre’s Archiving program. Coordinated by CI Nick Thieberger, Data Manager Julia Miller and Corpus Manager Wolfgang Barth, the program provided Centre members with training and guidance to ensure that material created in the course of Centre work was managed and archived effectively and responsibly [1].
CoEDL saw this work as part of its responsibility to communities, an acknowledgement of the colonial extraction of information that characterised much fieldwork-based research in the past, and a way to influence and promote research methods that are both more collaborative and more rigorous. There is also an important human element in corpora formation and circulation, as many in the community may be able to locate recordings of grandparents and other family members.
The three corpora introduced below demonstrate the time and effort required to collect and properly document a corpus as well as the impact the collection can have for both academic work and members of the communities CoEDL collaborated with.
Hero image: Language documentation notebooks. Image: CoEDL.
Image 1: Cassandra Algy Nimarra and Felicity Meakins record director-matcher tasks with Jamieisha Barry Nangala, Regina Crowson Nangari and Quitayah Frith Namija (Image: Jennifer Green, 2017).
Image 2: Alan Rumsey with the Ku Waru community. Image: Alan Rumsey.
IImage 3: Nick Thieberger in a meeting about the Bislama corpus. Image: Nick Thieberger/Robert Early.
[1] CoEDL Data Manager Julia Miller produced several guides on the principles and good practices of recording, managing and archiving data. These are available here.
Alan Rumsey (collector), 1983. Western Highlands of PNG recordings. Collection AR1 at catalog.paradisec.org.au [Closed Access]. https://dx.doi.org/10.4225/72/56E823B109D31
Hua, Xia, Meakins, Felicity, Algy, Cassandra, & Bromham, Lindell. (2022). Language change in multidimensional space: New methods for modelling linguistic coherence. Language Dynamics and Change, 12, 78-123.
McConvell, Patrick, & Meakins, Felicity. (2005). Gurindji Kriol: A mixed language emerges from code-switching. Australian Journal of Linguistics, 25(1), 9-30.
Meakins, Felicity. (2016). No fixed address: The grammaticalisation of the Gurindji locative as a progressive suffix. In F. Meakins & C. O'Shannessy (Eds.), Loss and Renewal: Australian Languages Since Colonisation (pp. 367-396). Berlin: Mouton de Gruyter.
Meakins, Felicity, & Algy, Cassandra. (2016). Deadly reckoning: Changes in Gurindji children's knowledge of cardinals. Australian Journal of Linguistics, 36(4), 479-501.
Meakins, Felicity, Hua, Xia, Algy, Cassandra, & Bromham, Lindell. (2019). The birth of a new language does not favour simplification. Language, 95(2), 294-332.
Meakins, Felicity, Jones, Caroline, & Algy, Cassandra. (2016). Bilingualism, language shift and the corresponding expansion of spatial cognitive systems. Language Sciences, 54, 1-13.
Meakins, Felicity, & Wilmoth, Sasha. (2020). Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol. In P. Arkadiev & F. Gardani (Eds.), The complexities of morphology (pp. 81-104). Oxford: Oxford University Press.
Thieberger, Nick. Bislama Corpus v03. In: Barth, Wolfgang (ed.). CoEDL Corpus Collection, https://go.coedl.net/bislama_corpus, accessed 10.08.2023.