论文部分内容阅读
本文以中国具有代表性的濒危语言仡佬语为实例,在田野调查所收集原始材料的基础上,通过切分、翻译、转写、标注、分析,构建多媒体语料数据库,开发综合语言信息处理系统,可实现在线浏览、语言增删、检索、分析及用户定制功能。系统采用用户层、数据层和后台服务三层结构,以标准化为设计导向,基于XML技术进行数据库的构建和处理。结果表明规范化不仅在实现技术上具有先进性,而且由于规范带来的通用性将使得系统更易扩展,更为实用。该系统的设计模型对其他不同濒危语言的数字化处理也有借鉴意义。
Based on the collection of original materials collected by the field investigation, this paper builds a multi-media corpus database by dividing, translating, transcribing, annotating, annotating, analyzing and building a multi-media corpus database to develop a comprehensive language information processing system. Can achieve online browsing, language additions and deletions, search, analysis and customization features. The system uses user layer, data layer and background service three-tier structure, standardized to design-oriented, based on XML technology for database construction and processing. The results show that the standardization is not only technologically advanced, but also makes the system more extensible and practical due to the commonality brought by the specification. The design model of the system is also of reference to other digital processing of different endangered languages.