Afrikaans stoplist for corpus research 1.0
datasetposted on 2021-06-23, 10:15 authored by Gerhard B. Van HuyssteenGerhard B. Van Huyssteen
The Afrikaans stoplist for corpus research (version 1.0) comprises a master list of 1,296 items, based on frequency counts in the Taalkommissie corpus 1.1. The list has been curated based on relative word frequency classes, and Zipf values. In addition, each item has been categorised in terms of length, typecase, selection category, lexical type (i.e. content or function word), and part-of-speech category. For ease of use, three subsets of the main list is also provided.