论文部分内容阅读
在语音统计中,发现语词出现不符合无规出现的泊松分布。作者提出一个新的统计分布函数,用以描述语言中语词出现规律。考虑到虽然有大量语词供人自由选择,但实际使用是经过作者深思的。因此语词的出现有随机性质,也有经过选择、处理的痕迹。这种选择、处理与使一无规信号(例如,电噪声)通过一窄带滤波器很相似。因此估计相应的瑞利分布可能更为适合。试验结果正是如此,即语词的累计分布符合瑞利累计分布函数(但其变数只取整数值,因语词数是整数)。由此建立起的累计分布同于瑞利分布而变数只取整数值的分布函数称为离散瑞利分布函数。语音中的语助词、音素、字母等都是与多个语词同时出现的,其分布按统计学的中值极限定理(用于离散过程)的概念应是无规的,即遵守泊松分布函数。试验也证实此点。在试验实际分布与设想的分布(泊松分布或离散瑞利分布)是否适合时,所用方法有三种,即分布图上观察、均方差或标准偏差比较和χ~2试验。认为在三种方法中分布图上的观察比较直观,χ~2试验比较严谨,而标准偏差比较则比较简单也有定量意义。简单方法可能是在图上观察后,再用标准偏差定量比较。
In phonetic statistics, it was found that words appeared inconsistent with random appearance of Poisson distribution. The authors propose a new statistical distribution function to describe the appearance of words in a language. Considering that although there are a large number of words for people to freely choose, but the actual use is the author of thought. Therefore, the appearance of words has a random nature, but also through the choice and treatment of traces. This option is similar to handling a narrowband filter with a random signal (eg, electrical noise). Therefore, it may be more appropriate to estimate the corresponding Rayleigh distribution. The test result is the case, that is, the cumulative distribution of words conforms to the Rayleigh cumulative distribution function (but its variables take only integer values because the number of words is an integer). The cumulative distribution thus established is the same as the Rayleigh distribution and the distribution function whose only integer is taken as the discrete Rayleigh distribution function. Words, phonetics, letters and so on in speech all appear at the same time with a plurality of words, the concept of its distribution according to statistical median limit theorem (for discrete process) should be random, that is to comply with Poisson distribution function . The experiment also confirmed this point. There are three methods used in the test to determine whether the actual distribution and the assumed distribution (Poisson or Discrete Rayleigh distribution) are fit, that is, distribution on the observation, mean square deviation or standard deviation and χ ~ 2 test. It is considered that the observations on the distribution maps of the three methods are more intuitive and the test of χ ~ 2 is more rigorous, while the standard deviation is relatively simple and quantitative. The simple method may be to observe the graph, and then use the standard deviation of quantitative comparison.