This project is a sub-project of the integrated research project for Chinese language learners' grading standards construction and corpus integration application. The purpose of this sub-project is to establish a general word frequency list for Chinese language teaching and to build a Chinese collocation list.
In the construction the general word frequency list, we employ the COCT corpus, which includes books, news and spoken data, as the basis for word frequency statistics. In order to avoid bias, we employ random sampling to select text from books, news and spoken data. And then extract general word frequency list from the sample data. In addition to the global frequencies, it will also provide the frequencies of various domains. Further, it will provide some frequency-related statistics including the accumulative frequency and the accumulative percentage of the corpus. In the construction of Chinese collocation list, we will extract collocations from the COCT corpus based on the "Basic Vocabulary" automatically. The passed collocation researches divided the collocations into two categories: lexical collocations and grammar collocations. Our project will develop the Chinese collocation category based on the passed researches, and the resulting architecture will be served as a reference for Chinese language teaching.
In the final, this project will produce a general word frequency list, a Chinese language collocation architecture and a Chinese language collocation list will be produced. The general word frequency list can serve as a reference for Chinese language teaching, textbook editing, and Chinese language testing. The Chinese collocation architecture can provide the basis for the future collocation teaching, and the Chinese collocation list can be used for collocation teaching, textbook writing and testing.