SPOKEN CORPORA AND ANALYSIS OF NATURAL SPEECH
This paper introduces spoken corpora of Taiwan Mandarin created at Academia Sinica and gives an overview of some recent studies carried out utilizing the spoken data. Spoken language resources of Taiwan Mandarin have been collected and processed at Academia Sinica since 2001. As a result, spoken data, which are useful not only for language archives purpose, but also for linguistic studies, has been made available. In addition to creation of the corpus, two lines of research are discussed in which theoretical and empirical studies are connected by using the aforementioned language resources: 1) language variation and change and 2) spoken discourse analysis. Phonetic reduction is one of the main reasons for changes within a language and it is important to take into account different levels of variations in spontaneous speech. For this purpose, we studied syllable contraction/merger, vowel reduction, and phonetic reduction in directional complements. Discourse items also play an essential part, because they add specific implications to sentences and their use is mainly marked by prosodic means. We segmented a spoken discourse into smaller prosodic units to allow for a more precise study of discourse items, prosodic features, and disfluency. These issues are correlated with each other, especially through prosodic markings.