search catalogue
catalogue

High-quality searchable corpus of the Cheyi language

Landing page image for the collection ‘High-quality searchable corpus of the Cheyi language’

A scene where the Cheyi native speaker Ms. Dami Kamo and Chime Lhamo were discussing the content of a traditional story. Photo by Chime Lhamo, 2022. Click on image to access collection.

Language Cheyi
Depositor Chime Lhamo, Tshering Sgrolma
Affiliation Fudan University
Location China
Collection ID 0794
Grant ID IGS1030
Funding Body ELDP
Collection Status Collection online
Landing Page Handle http://hdl.handle.net/2196/c49c12f6-cb13-447d-9f48-ce756819e111

 

Summary of the collection

English: This collection will aim to preserve the Cheyi language by providing high-quality audio and video recordings of the language in use. The materials will be transcribed in the International Phonetic Alphabet (IPA) and translated into Simplified Chinese and English. The project will collect a corpus from four villages in Pubarong Township, Yajiang County, to analyze the internal differences between the dialects of this language and determine its dialectal divisions.

The collection will be led by Chime Lhamo, a PhD student at Fudan University, affiliated with the Department of Chinese Language and Literature, specializing in Sino-Tibetan Studies, Linguistics, and Applied Linguistics. The team will include a photographic assistant, Tshering Sgrolma, who is also a native speaker of the Cheyi language and a skilled photographer. Additionally, there will be 12 language workers selected from local villages who will be trained in transcription and translation skills. The team will also include over 24 language consultants from 12 village groups, chosen based on age, gender, and language background. Other native speakers from the Cheyi Tibetan community will also contribute to the project. The important information of all team members will be recorded in the metadata database, and all participants will receive appropriate compensation and gifts to recognize their contributions to preserving their mother tongue.

This collection will be relevant for linguistic research, providing valuable data on the Cheyi language, aiding in its preservation and fostering greater understanding of its dialectal variations. This collection will also hold significant research value for anthropologists, historians, and scholars of religious folklore, providing them with valuable insights into the cultural, historical, and religious contexts of the Cheyi community.

中文: 该语料采集工作将通过提供高质量的语言音频和视频来记录和保存却域语。材料将使用国际音标(IPA)进行转录,并翻译成简体中文和英文。本项目将从雅江县普巴绒乡的四个村庄收集语料,以分析该语言方言之间的内部差异,并确定其方言划分。

该采集工作将由复旦大学中国语言文学系博士生青梅拉姆主导,她专攻语言学及应用语言学专业汉藏语言学方向。团队还有摄影助理泽让卓玛,她既是却域语母语者又是具备熟练拍摄技术的摄影师。此外,还将从当地村庄中选出12名语言工作者,他们将接受转录和翻译技能的培训。团队还将包括来自12个村组的24名以上的语言顾问,这些顾问将根据年龄、性别和语言背景进行选择。其他来自却域藏族社区的母语者也将参与该项目。所有团队成员的重要信息将记录在元数据数据库中,所有参与者将获得适当的报酬和礼物,以表彰他们为保存母语所做的贡献。

该语料采集工作将对语言学研究具有重要意义,提供关于却域语言的宝贵数据,帮助其保存,有助于促进大家对其方言变体的更深入理解。除此之外,这个收集工作还将对人类学家、历史学家和宗教民俗学家等领域的学者具有重要的研究意义,为他们提供宝贵的洞见,帮助他们深入了解车宜社区的文化、历史和宗教背景。

 

Group represented

English: Pubarong is the transcription of the Tibetan word “spo.bo.rong” or “pobs.pa.rong”. Historically, it was one of the four major agricultural regions in the Tibetan Khams region, which is distributed along the narrow valley of the Yalong River, and its residents used to live mainly on agriculture.
Figure 1 is a map showing the original locations of the four villages in Pubarong Township, Yajiang County. (Chime Lhamo (2023):2)


In 2014, the construction of the Yalong River Lianghekou Hydropower Station (the largest hydroelectric power station in China’s Tibetan region in terms of integrated scale) officially started, and about six or seven years before the start of the construction, the native speakers of the Cheyi language, the Puparong township residents in Yajiang county, Sichuan province in China, began to evacuate from their hometowns one by one and move to the Yajiang, Xinlong, and Litang counties, with most Pubarong residents later dispersing within Yajiang County. Yajiang County is one of the 18 counties in Ganzi Prefecture with a wide variety of languages spoken, including the Kham dialect of Tibetan, the Amdo dialect of Tibetan, the Sichuan dialect of Chinese, the Yajiang inverted language, the Minyag language, the nDrapa language, and the Cheyi language. For the Cheyi community, the hometown where they have lived for generations before has somehow ceased to exist. This relocation has exacerbated the disappearance of the original speech community of the Cheyi, which is endangered by the fact that the native speakers, after moving to a multilingual linguistic environment, have gradually preferred to have their descendants learn strong languages such as Chinese and Tibetan in order to obtain new learning and work opportunities. It is urgent to record as much authentic linguistic material as possible before the language disappears completely.

Table 1 (Chime Lhamo (2023):3) shows the resident population of Pubarong Township which is from the 2020 census, while the total number of households was 554 and the total population was 2018.


中文:普巴绒是藏语词“spo.bo.rong”或“pobs.pa.rong”的音译。历史上,它是藏族康巴地区的四大农区之一,分布在雅砻江的狭长河谷地带,居民主要以农业为生。图1显示了雅江县普巴绒乡四个村庄的原始位置。(Chime Lhamo (2023):2)

2014年,雅砻江两河口水电站(中国藏区综合规模最大的水电站)正式开工建设,大约在开工前六七年,四川省雅江县普巴绒乡的却域语母语者开始陆续从村寨里撤离,搬到临近的雅江、新龙、理塘等县县城,大部分普巴绒籍的居民后来在雅江县内分散居住。雅江县是甘孜州的18个县之一,这里语言多样,包括藏语康方言、藏语安多方言、四川话、雅江倒话、木雅语、扎坝语和却域语。对于却域社区来说,他们世代居住的故乡在某种程度上已经不复存在。这次搬迁加剧了却域语原有语言社区的消失,母语者搬到多语言环境后,为了获得新的学习和工作机会,逐渐倾向于让后代学习汉语和藏语等强势语言。因此迫切需要在这种语言完全消失之前,尽可能多地记录下真实的语言材料。

表1(Chime Lhamo (2023):3)显示了普巴绒乡2020年的人口普查数据,总户数为554,总人口为2018。

 

Language information

English: The Cheyi language, also known as Choyo, Choyul, or Queyu in English, is a non-Tibetan language spoken by Tibetan inhabitants of Yajiang, Daofu, Xinlong, Litang, and Kangding counties in Ganzi Tibetan Autonomous Prefecture, Sichuan Province, China. The place where the people of Cheyi are located looks like a khyung bird, so the Cheyi inhabitants of these areas call themselves “tɕʰɤ ji pa”, which is probably derived from the Tibetan “khyung.yul.pa”, which means “People from the khyung Region”. Therefore, we would use the native speakers’ self-reference and call it the “Cheyi” language. (More details can be seen in Chime Lhamo (2023) )

Concerning the affiliation of the Cheyi language, Sun Hongkai(1982) originally classified the Cheyi language as Qiangic of the Qiang-Jingpo group of the Tibeto-Burman of the Sino-Tibetan language family and indicated that the Cheyi language is a language between the northern and southern branches of the Qiangic.

中文:却域语(The Cheyi Language),也被写成Choyo、Choyul或Queyu,是一种由中国四川省甘孜藏族自治州雅江、道孚、新龙、理塘和康定等县的部分藏族居民使用的非藏语语言。却域人的居住地形状像一只大鹏琼鸟,因此这些地区的却域居民称自己为“tɕʰɤ ji pa”,这可能源自藏语的“khyung.yul.pa”,意思是“琼地区的人”。因此,我们采用母语者的自称,称之为“却域”语。(更多详细信息可见Chime Lhamo (2023))

关于却域语的系属归类问题,孙宏开(1982)最初将却域语归为汉藏语系藏缅语族羌语支的羌景颇语群,并指出却域语是羌语支北部和南部分支之间的一种语言。

 

Collection contents

English: (1) High-quality audio and video materials
I will select appropriate native speakers from the four villages and collect 10 hours of naturalistic data from each village, totaling 40 hours of data. Regarding the selection of speakers, I will take age, gender, and language background into consideration to collect authentic, rich, and representative natural corpus. Considering the clarity, this project will mainly collect single-person declarative corpora, but also conversational corpora involving 2-5 native speakers. If there are some important traditional activities involving more than ten people, they will also be filmed and recorded in a more relevant way using appropriate photographic techniques.
In total, the audio-visual output will be 40 hours:
– 40 traditional stories(10 hours)
– 40 daily spontaneous conversations(10 hours)
– 20 traditional activity descriptions(10 hours)
– 20 procedural descriptions of activities(10 hours)

The tales and descriptions of traditional activities will not only be of linguistic interest, but some task-based data will be collected to service the linguistic research community in particular filling any gaps in the existing scholarly literature on the Cheyi language. Furthermore, they will also help to document and preserve the local culture. These data will be collected by a video camera and an additional audio recorder. The language consultants will receive the audio and video files in the format of their preference.
– Elicitation sessions: grammatical task-based recordings and wordlist (5 hours)

(2) A transcribed and translated searchable corpus
After collecting 45 hours of corpus, 8 hours of them will be translated into Chinese and English, and verbatim transcription and annotation will be carried out in 3 hours by ELAN and FLEX software. The annotated ELAN files will be archived with the help of local transcription assistants in ELAR. Those glossing files will have five levels of annotation, consisting of morpheme-by-morpheme transcription using IPA, with lexical explanations in Chinese and English, along with free translations in Chinese and English. The materials will be made available on DVD or uploaded to a social media site that will be set up and made accessible to the language consultants for the project.

中文:(1) 高质量的音频视频材料
我将从四个村庄中选择适当的母语者,并从每个村庄收集10小时的自然数据,总计40小时的数据。关于说话者的选择,我会考虑年龄、性别和语言背景,以收集真实、丰富和具有代表性的自然语料。考虑到清晰度,本项目主要收集单人陈述语料,但也会包括2-5名母语者参与的对话语料。如果有涉及十多人参与的重要传统活动,也将使用适当的摄影技术进行拍摄和记录。

总的来说,音视频输出将是40小时:
– 40个传统故事(10小时)
– 40个日常自发对话(10小时)
– 20个传统活动描述(10小时)
– 20个活动过程描述(10小时)

传统故事和活动描述不仅具有语言学价值,一些基于任务的数据也将收集起来,以特别服务于语言学研究社区,填补现有却域语学术文献中的空白。此外,这些数据还将帮助记录和保存当地文化。这些数据将通过摄像机和额外的录音机收集。语言顾问将以他们喜欢的格式收到音频和视频文件。
– 引导会话:基于语法任务的引导性数据和词表(5小时)

(2) 转录和翻译后的可搜索语料库
在收集45小时的语料后,其中8小时将被翻译成中文和英文,3小时将通过ELAN和FLEX软件进行逐字转录和注释。注释后的ELAN文件将在当地转录助理的帮助下存档在ELAR中。这些词汇注释文件将有五个层次的注释,包括使用IPA进行逐词转录,提供中文和英文的词汇解释,以及中文和英文的自由翻译。这些材料将以DVD形式提供或上传到一个社交媒体网站,并对项目的语言顾问开放。

 

References

English: Chime Lhamo. 2023. Person Indexation in Cheyi. To be published.

Sun, Hongkai. 1982. A preliminary study of the branching of the qiangic language. Minzu Yuwen.

中文: 青梅拉姆. 2023. 《却域语的人称范畴》即将出版

孙宏开. 1982. 《羌语支属问题初探》民族语文研究文集

 

Acknowledgement and citation

To refer to any data from the collection, please cite as follows:
Chime Lhamo and Tshering Sgrolma. 2024. High-quality searchable corpus of the Cheyi language. Endangered Languages Archive. Handle: http://hdl.handle.net/2196/ff9fe689-d9a1-4b58-9809-6941f62ec641. Accessed on [insert date here].

Click to access collection

Powered by Preservica
© Copyright 2025