式1.57, 1.58
用MLE计算Gaussian分布的sample mean和sample variance。
首先log likelihood可以化简为: \[ \begin{align} \mathcal{L} = \ln P(x|\mu, \sigma^2) &= \sum\limits_{i=1}^{N}\ln \mathcal{N}(x|\mu, \sigma^2) \\ &= \sum\limits_{i=1}^{N}\ln \frac{1}{\sqrt{2\pi\sigma^2}} + \ln \exp(\frac{(x_i-\mu)^2}{-2\sigma^2}) \\ &= \sum\limits_{i=1}^{N} \frac{(x_i - \mu)^2}{-2\sigma^2} - \frac{N}{2} \ln2\pi - \frac{N}{2}\ln\sigma^2 \end{align}\ \] 先对\(\mu\)求偏导, 可以求得\(\mu_{MLE}\): \[ \begin{align} \frac{\partial \mathcal{L}}{\partial \mu} &= \sum\limits_{i=1}^{N}\frac{x_i - \mu}{\sigma^2} = 0 \\ &\Leftrightarrow N\mu_{MLE} = \sum\limits_{i=1}^{N}x_i \\ &\Leftrightarrow \mu_{MLE} = \frac{1}{N}\sum\limits_{i=1}^{N}x_i \end{align} \] 同时我们可以验证我们的sample mean和数据源自的Gaussian分布的真正均值\(\mu\)是相等的: \[ \begin{align} \mathbb{E}[\mu_{MLE}] &= \frac{1}{N}\mathbb{E}[\sum\limits_{i=1}^{N}x_i] = \mu \end{align} \] 再来算 \(\sigma^2_{MLE}\): \[ \begin{align} \frac{\partial \mathcal{L}}{\partial \sigma^2} &= \sum\limits_{i=1}^{N}\frac{\partial\frac{(x_i-\mu)^2}{-2\sigma^2}}{\partial\sigma^2} - \frac{\partial\frac{N}{2}\ln \sigma^2}{\partial\sigma^2} \\ &= \frac{\sum\limits_{i=1}^{N}(x_i-\mu)^2}{2} \frac{1}{\sigma^4} - \frac{N}{2\sigma^2} = 0 \\ \end{align} \] 左右同乘 \(2\sigma^4\), 代入 \(\mu_{MLE}\) 整理可以得出: \[ \sigma^2_{MLE} = \frac{1}{N}\sum\limits_{i=1}^{N}(x_i - \mu_{MLE})^2 \] 需要注意的是通过极大似然估计出来的 \(\sigma^2_{MLE}\) 会低估实际分布的方差,所以是一个有偏估计(过拟合问题也与此相关): \[ \begin{align} \mathbb{E}[\sigma^2_{MLE}] &= \mathbb{E}[\frac{1}{N}\sum\limits_{i=1}^{N}(x_i - \mu_{MLE})^2] \\ &= \frac{1}{N}\sum\limits_{i=1}^{N}\mathbb{E}[(x_i^2 + \mu_{MLE}^2 - 2x_i\mu_{MLE})] \\ &= \frac{1}{N}\sum\limits_{i=1}^{N}(\mathbb{E[x_i^2]} + \mathbb{E}[\mu_{MLE}^2] - 2\mu_{MLE}\mathbb{E}[x_i]) \\ &= \frac{1}{N}\sum\limits_{i=1}^{N}(\mathbb{E[x_i^2]} - \mathbb{E}[\mu_{MLE}^2]) \\ &= \mathbb{E}[\frac{1}{N}\sum\limits_{i=1}^{N}(x_i^2 - \mu^2)] - \mathbb{E}[\mu_{MLE}^2 - \mu^2] \\ &= \frac{1}{N}\sum\limits_{i=1}^{N} \mathrm{Var}(x_i) - [\mathbb{E}[\mu_{MLE}^2] - \mathbb{E}[\mu^2]] \\ &= \frac{1}{N}\sum\limits_{i=1}^{N} \mathrm{Var}(x_i) - \mathrm{Var}(\mu_{MLE}) \\ &= \sigma^2 - \mathrm{Var}(\frac{1}{N}\sum\limits_{i=1}^{N}x_i) \\ &= \frac{N-1}{N}\sigma^2 \end{align} \]
式1.67
从MAP角度来看curve fitting problem。
我们先假定曲线的权重 \(w\) 符合Gaussian分布的先验: \[ P(w|\alpha) = \mathcal{N}(w|0, \alpha^{-1}I) = (\frac{\alpha}{2\pi})^{(M+1)/2}\exp (-\frac{\alpha}{2}w^Tw) \] 则我们知道后验概率正比于似然函数与先验分布的乘积: \[ P(w|x,t,\alpha,\beta) \propto P(t|x,\alpha,\beta)P(w|\alpha) \] 于是最大化后验概率就变成了最大化 \(P(t|x,\alpha,\beta)P(w|\alpha)\): \[ \begin{align} \ln P(t|x,\alpha,\beta)P(w|\alpha) &= \ln P(t|x,\alpha,\beta) + \ln P(w|\alpha) \\ \end{align} \] 我们分开看这两项: \[ \ln P(t|x,\alpha,\beta) = -\frac{\beta}{2}\sum\limits_{n=1}^{N}\{y(x_n, w) - t_n\}^2 + \frac{N}{2}\ln \beta - \frac{N}{2} \ln (2\pi) \]
\[ \begin{align} \ln P(w|\alpha) &= \ln (\frac{\alpha}{2\pi})^{(M+1)/2} + (-\frac{\alpha}{2}w^Tw)\\ &= -\frac{\alpha}{2}w^Tw + \frac{M+1}{2}\ln (\frac{\alpha}{2\pi}) \end{align} \]
最大化 \(\ln P(t|x,\alpha,\beta) + \ln P(w|\alpha)\) 相当于最大化 \(-\frac{\alpha}{2}w^Tw - \frac{\beta}{2}\sum\limits_{n=1}^{N}\{y(x_n, w) - t_n\}^2\), 也就是最小化 \(\frac{\alpha}{2}w^Tw + \frac{\beta}{2}\sum\limits_{n=1}^{N}\{y(x_n, w) - t_n\}^2\) (前一项是正则项,后一项是error function), 也就是说当权重的先验是Gaussian分布的时候MAP和岭回归(Ridge Regression)等价。