Skip to main content

華語文詞語表詞類標記對應之研究A comparative analysis of the correspondence of part-of-speech systems between COCT and TCSL teaching materials

計畫屬性院內研究計畫
計畫編號NAER-2019-029-C-1-1-B5-01
GRB編號PG10811-0071
計畫名稱華語文詞語表詞類標記對應之研究
計畫名稱(外文)A comparative analysis of the correspondence of part-of-speech systems between COCT and TCSL teaching materials
計畫類型整合型計畫
整合型計畫名稱華語文學習者分級標準建構及語料庫整合應用
所屬計畫子計畫一
計畫主持人
李詩敏 Li, Shih-Min
經費來源國家教育研究院
執行方式自行研究(本院經費-本院人員)
年度2019
執行期程(起)2019-08-01
執行期程(訖)2020-12-31
執行狀態已結案
計畫經費0
摘要  藉由詞類標記,語料庫中龐大且複雜的詞彙可被歸類劃分為數個至數十個關係,因此,詞類標記是語料庫中最重要且關鍵的訊息,有助於語言研究及教學。目前國內不少語料庫(包括本整合型計畫所建置的「華語文語料庫」)的詞類標記沿用「中央研究院漢語平衡語料庫」簡化詞類標記,此套詞類標記集適合用於語言分析及自然語言處理,共46個標記,和現行坊間主要的華語文教材所使用的詞類標記系統無法一對一對應。為整合華語文各套教材在詞類標記的對應,以及連結語言學理論與華語文教學在詞類上的知識基礎,子計畫一採用文獻分析及專家諮詢之研究方法,探討各套詞類標記集的內容及分類依據,並據此建立中研院詞類標記與華語文教材的詞類標記對應轉換規則,以推廣「華語文語料庫」在華語教學、教材編纂及研究分析之應用。
摘要(外文)  The large and complex number of words in corpora can be classified into several to tens of relationships by part-of-speech tagging; therefore, POS tagging is the most important and crucial information of corpora as well as beneficial to language research and teaching. At present, the POS tagging in Sinica Corpus which is adopted by many corpora in Taiwan, including COCT (i.e. Corpus of Contemporary Taiwanese Mandarin) constructed by this integrated project. The Sinica POS tagging is applicable to language analysis and NLP, and is not one-to-one mapping to the POS systems in TCSL teaching materials. In order to integrate the correspondence of the POS systems among TCSL teaching materials and to link the knowledge base of the POS tagging between linguistics and TCSL, this subproject adopts the research methods of literature review and expert consultation to explore the content and classification among these POS systems, and then to establish corresponding rules of the POS systems between Sinica Corpus and TCSL teaching materials for the purpose of promoting the application of COCT in language teaching, textbook compilation and research analysis.
關鍵字
詞類標記華語文語料庫華語文教材中央研究院漢語平衡語料庫
關鍵字(外文)
POS taggingSinica corpusCOCTTCSL teaching materials