DEVELOPING AN ONLINE CORPUS OF FORMOSAN LANGUAGES
Li-May Sung, Lily I-wen Su, Fuhui Hsieh and Zhemin Lin
Information technologies have now matured to the point of enabling researchers to create a repository of language resources, especially for those languages facing the crisis of endangerment. The development of an online platform of corpora, made possible by recent advances in data storage, character-encoding and web technology, has profound consequences for the accessibility, quantity, quality and interoperability of linguistic field data. This is of particular significance for Formosan languages in Taiwan, many of which are on the verge of extinction. As a response to the recognition of this burgeoning problem, the key objectives of the establishment of the NTU Corpus of Formosan Languages aim to document and thus preserve valuable linguistic data, as well as relevant ethnological and cultural information. This paper will introduce some of the theoretical bases behind this initiative, as well as the procedures, transcription conventions, database normalization, in-house system and three special features in the creation of this corpus.
Key words: Formosan languages, Taiwan, corpus, database normalization, discourse, intonation unit (IU), 'Pear' story, 'Frog' story, cross-referencing retrievability, multilingual search, interoperability