search catalogue

A multimedia corpus of siPhuthi

Landing page image for the collection 'A multi-media digital corpus of siPhûthî'

Landing page image for the collection ‘A multimedia corpus of siPhuthi’. Click on image to access collection.


Language siPhuthi
Depositor Sheena Shah
Affiliation TU Dortmund University
Location Lesotho
Collection ID 0506, 0651
Grant ID MDP0401
Funding Body ELDP, Alexander von Humboldt Foundation
Collection Status Collection online
Landing Page Handle


Summary of the collection

The siPhuthi multimedia digital corpus contains primary language data of different genres recorded in different settings. The collection includes audio-video recordings from speakers of various ages depicting current use of siPhuthi. The collected modern language data will be supplemented by digitised, curated and archived audio recordings from the mid-90s (collected by Dr. Simon Donnelly), the latter allowing for a glimpse into the cultural and linguistic past of a rapidly changing and diminishing language community.


Group represented

The collection contains contributions from baPhuthi who live in Lesotho and South Africa. Most baPhuthi communities are scattered and live in remote areas in two marginalised and poorly developed districts of Lesotho, namely Quthing and Qacha’s Nek. Some baPhuthi also live in the Mohale’s Hoek district in southern Lesotho, as well as in the northern Eastern Cape province of South Africa. Because of job opportunities or other incentives (e.g. better infrastructure), siPhuthi-speaking individuals and families have migrated to other parts of Lesotho, such as Maseru and Teyateyaneng (T.Y.), but also to South Africa, e.g. Rustenburg (for mine work) and Ceres (for seasonal plantation work).


Special characteristics

The collection will also include legacy materials produced by Dr. Simon Donnelly in the mid-90s.


Collection contents

The collection will contain a minimum of 60 hours of recordings: 40hrs of audio-video recordings from speakers of various ages depicting current use of siPhuthi and 20hrs of audio recordings from the mid-90s. More specifically, the collection will comprise:

  • 20hrs of time-aligned ELAN transcriptions, translations and annotations of audio-video recordings drawn from narratives (4hrs), interviews (3hrs), natural conversations (6hrs), direct elicitations using a diagnostic tool (6hrs) and songs (1hr), of which 12hrs will come from Daliwe and 8hrs from Sinxondo, Mohale’s Hoek and Qacha’s Nek.
  • 20hrs of partially- or non-transcribed recordings (with complete metadata).
  • 20hrs of legacy materials (recordings, photographs, fieldnotes) produced by Dr Simon Donnelly in the mid-90s, of which a minimum of 5hrs will be transcribed, translated and annotated. These legacy materials contain folk stories, information on traditional cultural knowledge and elicited grammatical data.

In addition, the collection will contain a quadrilingual wordlist (siPhuthi, Sesotho, isiXhosa, English) produced in FLEx consisting of lexical items generated from directed elicitations and collected texts, as well as scanned notebook pages and photographs.


Acknowledgement and citation

To refer to any data from the collection, please cite as follows:

Shah, Sheena. 2019. A multimedia corpus of siPhuthi. Endangered Languages Archive. Handle: Accessed on [insert date here].

Powered by Preservica
© Copyright 2024