一、样本方差
设样本均值为$\bar x$,样本方差为S2,总体均值为${\rm{\mu }}$,总体方差为${
{\rm{\sigma }}^2}$,那么样本方差${S^2} = \frac{1}{
{n - 1}}\mathop \sum \limits_{i = 1}^n {\left( { {x_i} - \bar x} \right)^2}$推导:假设样本数量等于总体数量,应有
${S^2} = \frac{1}{n}\mathop \sum \limits_{i = 1}^n {\left( {
{x_i} - \bar x} \right)^2}$在多次重复抽取样本过程中,样本方差会逐渐接近总体方差,假设每次抽取的样本方差为
(S12,S22,S32…),然后对这些样本方差求平均值记为E(S2),则
${\rm{E}}\left( {
{ {\rm{S}}^2}} \right) = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n { {\left( { {x_i} - \bar x} \right)}^2}} \right)$$ = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {
{\left( {\left( { {x_i} - \mu } \right) - \left( {\bar x - \mu } \right)} \right)}^2}} \right)$因为
$\frac{1}{n}\mathop \sum \limits_{i = 1}^n \left( {
{x_i} - \mu } \right) = \frac{1}{n}\mathop \sum \limits_{i = 1}^n {x_i} - \mu = \bar x - \mu $接上式
${\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {
{\left( {\left( { {x_i} - \mu } \right) - \left( {\bar x - \mu } \right)} \right)}^2}} \right) = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n { {\left( { {x_i} - \mu } \right)}^2} - \frac{1}{n}\mathop \sum \limits_{i = 1}^n 2({x_i} - \mu )\left( {\bar x - \mu } \right) + \frac{1}{n}\mathop \sum \limits_{i = 1}^n { {\left( { {x_i} - \mu } \right)}^2}} \right)$$ = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {
{\left( { {x_i} - \mu } \right)}^2} - 2\left( {\bar x - \mu } \right)\left( {\bar x - \mu } \right) + { {\left( {\bar x - \mu } \right)}^2}} \right)$$ = {\rm{\;E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {
{\left( { {x_i} - \mu } \right)}^2} - { {\left( {\bar x - \mu } \right)}^2}} \right)$$ = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {
{\left( { {x_i} - \mu } \right)}^2}} \right) - E({\left( {\bar x - \mu } \right)^2}) \le {\sigma ^2}$所以样本方差除以n会小于总体方差
${\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {
{\left( { {x_i} - \mu } \right)}^2}} \right) - E({\left( {\bar x - \mu } \right)^2}) = {\sigma ^2} - \frac{1}{n}{\sigma ^2} = \frac{ {n - 1}}{n}{\sigma ^2}$所以样本方差与总体方差差(n-1)/n倍。
二、协方差
协方差是对两个随机变量联合分布线性相关程度的一种度量。两个随机变量越线性相关,协方差越大,完全线性无关,协方差为零。
Cov(x,y) = E[(x-E(x))(y-E(y))]
特殊的当只存在一个变量x,x与自身的协方差等于方差,记作Var(x)
Cov(x,x) =Var(x)= E[(x-E(x))(x-E(x))]
样本协方差
对于多维随机变量Q(x1,x2,x3,…,xn),样本集合为xij=[x1j,x2j,…,xnj](j=1,2,…,m),m为样本数量,在a,b(a,b=1,2…n)两个维度内
${\rm{cov}}\left( { { {\rm{x}}_{\rm{a}}},{ {\rm{x}}_{\rm{b}}}} \right) = \frac{ {\mathop \sum \nolimits_{j = 1}^m \left( { {x_{aj}} - { {\bar x}_a}} \right)\left( { {x_{bj}} - { {\bar x}_b}} \right)}}{ {m - 1}}$
三、协方差矩阵
对于多维随机变量Q(x1,x2,x3,…,xn)我们需要对任意两个变量(xi,xj)求线性关系,即需要对任意两个变量求协方差矩阵
Cov(xi,xj)= E[(xi-E(xi))(xj-E(xj))]
\[{\rm{cov}}\left( {
{x_i},{x_j}} \right) = \left[ {\begin{array}{*{20}{c}}{ {\rm{cov}}\left( { {x_1},{x_1}} \right)}&{ {\rm{cov}}\left( { {x_1},{x_2}} \right)}&{ {\rm{cov}}\left( { {x_1},{x_3}} \right)}& \cdots &{ {\rm{cov}}\left( { {x_1},{x_{\rm{n}}}} \right)}\\{ {\rm{cov}}\left( { {x_2},{x_1}} \right)}&{ {\rm{cov}}\left( { {x_2},{x_2}} \right)}&{ {\rm{cov}}\left( { {x_2},{x_3}} \right)}& \cdots &{ {\rm{cov}}\left( { {x_2},{x_n}} \right)}\\{ {\rm{cov}}\left( { {x_3},{x_1}} \right)}&{ {\rm{cov}}\left( { {x_3},{x_2}} \right)}&{ {\rm{cov}}\left( { {x_3},{x_3}} \right)}& \cdots &{ {\rm{cov}}\left( { {x_3},{x_n}} \right)}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\{ {\rm{cov}}\left( { {x_m}{x_1}} \right)}&{ {\rm{cov}}\left( { {x_m},{x_2}} \right)}&{ {\rm{cov}}\left( { {x_m},{x_3}} \right)}& \cdots &{ {\rm{cov}}\left( { {x_m},{x_n}} \right)}\end{array}} \right]\]
【 结束 】