CORPORA CREATION IN CONTRASTIVE LINGUISTICS
Keywords:
monolingual corpus, parallel corpus, contrastive linguistics, comparability measureAbstract
Universal and specific features of language usage can become more evident if tested against the non-elicited language data on large scale. This requirement can be met by using corpora that provide ample data to test research hypotheses in contrastive language studies in objective and falsifiable manner. However, criteria in corpora creation and comparability measures in the evaluation of available corpora present a separate problem in contrastive linguistics. The article presents an overview of the types of corpora used in Contrastive Linguistics research and describes their characteristic features. The study proceeds to look into the sources of data used in corpora creation both in (commercially) available corpora and data collections compiled to answer a particular research question. The article describes the techniques used in creating comparable corpora for contrastive studies and presents the comparability measures to evaluate the corpora. The study examines the case of building a topic-specific comparable corpus in English and Ukrainian. The corpus focuses on education-related vocabulary in the languages under analysis. The corpus comparability is measured using translation equivalence and word frequency similarity. The article used the procedures outlined above to collect a quasi-comparable (non-aligned) corpus focusing on the topic of education with the English and Ukrainian languages in contrast. Using frequency comparability measure it was established that both components of the corpus (in the English and Ukrainian languages) contain keywords related to the topic of education.