LDA

Beta/Dirichlet 分布的期望

假设$ p\sim Beta(t|\alpha,\beta)$,那么

\[E(p) = \int_0^1 t \ast Beta(t|\alpha,\beta) dt \\ = \int_0^1 t \ast \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}t^{\alpha-1}(1-t)^{\beta-1}dt \\ = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \int_0^1 t^{\alpha}(1-t)^{\beta-1}dt\]

显然,右侧的积分是$Beta(t|\alpha + 1,\beta)$的一部分,有

\[\int_0^1 \frac{\Gamma(\alpha+\beta+1)}{\Gamma(\alpha+1)\Gamma(\beta)} t^{\alpha}(1-t)^{\beta-1}dt = 1\]

所以

\[\int_0^1 t^{\alpha}(1-t)^{\beta-1}dt = \frac{\Gamma(\alpha+1)\Gamma(\beta)}{\Gamma(\alpha+\beta+1)}\]

代入到$E(p)$中,

\[E(p) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \ast \frac{\Gamma(\alpha+1)\Gamma(\beta)}{\Gamma(\alpha+\beta+1)} \\ = \frac{\Gamma(\alpha+\beta) \Gamma(\alpha+1) }{\Gamma(\alpha) \Gamma(\alpha+\beta+1) }\]

而Gamma函数具有如下性质:

\[\Gamma(s+1)=s\Gamma(s)\]

所以,

\[E(p)= \frac{\Gamma( \alpha+\beta) \ast \alpha \Gamma(\alpha)}{\Gamma(\alpha) \ast(\alpha+\beta)\ast \Gamma(\alpha+\beta)} \\ = \frac{\alpha }{\alpha+\beta}\]

即,如果随机变量p服从beta分布$Beta(\alpha,\beta)$,那么它的期望均值为$\frac{\alpha }{\alpha+\beta}$

Dirichlet分布也有类似的性质:

假设$\mathop{p}\limits^{\rightarrow}\sim Dir(\mathop{t}\limits^{\rightarrow}|\mathop{\alpha}\limits^{\rightarrow})$,那么其期望为

\[E(\mathop{p}\limits^{\rightarrow})= (\frac{\alpha_1}{\sum_{i=1}^K\alpha_i}, \frac{\alpha_2}{\sum_{i=1}^K\alpha_i}, ..., \frac{\alpha_K}{\sum_{i=1}^K\alpha_i})\]

这两个结论在LDA推导中会经常用到。

下面介绍贝塔函数的定义:

\[B(\alpha)=\frac{\prod_{i=1}^k \Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k \alpha_i)}\]

上面推导中也用到了该函数,这个函数其实是狄利克雷分布和贝塔分布的归一化因子。

所以

\[B(\alpha)=\int\prod_{i=1}^k\theta_i^{\alpha_i-1}d\theta\] \[p(z|w,\alpha,\beta)=\frac{p(w,z|\alpha,\beta)}{p(w|\alpha,\beta)}\propto p(w,z|\alpha,\beta)\]

下面考虑计算联合分布$p(w,z|\alpha,\beta)$

\[p(w,z|\alpha,\beta)=p(z|\alpha,\beta)p(w|z,\alpha,\beta)=p(z|\alpha)p(w|z,\beta)\]

下面分别计算 $p(z|\alpha)$ 和$p(w|z,\beta)$

\[p(z|\alpha)=\int p(z|\theta)p(\theta|\alpha)d\theta \\ =\int \prod_{m=1}^M\prod_{k=1}^K \theta_{mk}^{n_{mk}}\frac{\prod_{k=1}^K \theta_{mk}^{\alpha_k-1}}{B(\alpha)} d\theta \\ =\int \prod_{m=1}^M \frac{1}{B(\alpha)}\prod_{k=1}^K\theta_{mk}^{n_{mk}+\alpha_k-1}d\theta \\ =\prod_{m=1}^M \frac{1}{B(\alpha)} \int \prod_{k=1}^K\theta_{mk}^{n_{mk}+\alpha_k-1}d\theta \\ =\prod_{m=1}^M \frac{B(n_m+\alpha)}{B(\alpha)}\]

$p(\theta|\alpha)$是先验分布,服从狄利克雷分布,$p(z|\theta)$是多项式分布,因为狄利克雷分布是多项式分布的共轭先验,所以最终的结果也是一个狄利克雷分布。

\[p(w|z,\beta)=\int p(\varphi |\beta) p(w|z,\varphi ) d\varphi \\ =\int \prod_{k=1}^K \prod_{v=1}^V \varphi_{kv}^{n_{kv}} \frac{\prod_{v=1}^V \varphi^{\beta_v-1} }{B(\beta)} d\varphi \\ =\prod_{k=1}^K \frac{1}{B(\beta)} \int \prod_{v=1}^V\varphi_{kv}^{n_{kv}+\beta_v-1} d\varphi \\ =\prod_{k=1}^K \frac{B(n_{k}+\beta)}{B(\beta)}\]

这也是一个狄利克雷分布-多项式分布的共轭结构,所以结果也是狄利克雷分布。

所以:

\[p(w,z|\alpha,\beta)=p(z|\alpha)p(w|z,\beta)\\ =\prod_{m=1}^M \frac{B(n_m+\alpha)}{B(\alpha)} \prod_{k=1}^K \frac{B(n_{k}+\beta)}{B(\beta)}\] \[p(z|w,\alpha,\beta) \propto \prod_{m=1}^M \frac{B(n_m+\alpha)}{B(\alpha)} \prod_{k=1}^K \frac{B(n_{k}+\beta)}{B(\beta)}\] \[p(z_i=k|\vec{z}_{\neg i},\vec{w}) \propto p(z_i=k,w_i=t|\vec{z}_{\neg i},\vec{w}_{\neg i}) \\ =\int p(z_i=k,w_i=t,\theta_m,\varphi_k|\vec{z}_{\neg i},\vec{w}_{\neg i}) d\vec{\theta}_md\vec{\varphi}_k \\ =\int p(z_i=k,\vec{\theta}_m|\vec{z}_{\neg i},\vec{w}_{\neg i}) p(w_i=t,\vec{\varphi}_k| \vec{z}_{\neg i},\vec{w}_{\neg i}) d\vec{\theta}_md\vec{\varphi}_k \\ =\int p(z_i=k|\vec{\theta}_m) p(\vec{\theta}_m|\vec{z}_{\neg i},\vec{w}_{\neg i}) p(w_i=t| \vec{\varphi}_k) p( \vec{\varphi}_k| \vec{z}_{\neg i},\vec{w}_{\neg i} ) d\vec{\theta}_md\vec{\varphi}_k \\ =\int p(z_i=k|\vec{\theta}_m) Dir(\vec{\theta}_m|\vec{n}_{m,\neg i}+ \vec{\alpha}) d\vec{\theta}_m \int p(w_i=t| \vec{\varphi}_k) Dir( \vec{\varphi}_k| \vec{n}_{k,\neg i}+\beta ) d\vec{\varphi}_k \\ =\int \theta_{mk} Dir(\vec{\theta}_m|\vec{n}_{m,\neg i}+ \vec{\alpha}) d\vec{\theta}_m \int \varphi_{kt} Dir( \vec{\varphi}_k| \vec{n}_{k,\neg i}+\beta ) d\vec{\varphi}_k \\ = E(\theta_{mk})\cdot E(\varphi_{kt})\\ =\hat{\theta}_{mk} \cdot \hat{\varphi}_{kt}\]

根据前面的推导

\[\hat{\theta}_{mk}= \frac{ n_{m,\neg i}^{(k)} +\alpha_k }{ [\sum_{k=1}^K n_{m,\neg i}^{(k)}+\alpha_k ]-1}\] \[\hat{\varphi}_{kt} =\frac{ n_{k,\neg i}^{(t)} +\beta_t }{ \sum_{t=1}^V n_{k,\neg i}^{(t)} +\beta_t }\] \[p(z_i=k|\vec{z}_{\neg i},\vec{w})\propto \frac{ n_{m,\neg i}^{(k)} +\alpha_k }{ [\sum_{k=1}^K n_{m,\neg i}^{(k)}+\alpha_k ]} \frac{ n_{k,\neg i}^{(t)} +\beta_t }{ \sum_{t=1}^V n_{k,\neg i}^{(t)} +\beta_t }\]

参考:

LDA数学八卦 LDA漫游指南

宁雨 /
Published under (CC) BY-NC-SA in categories MachineLearning  tagged with
comments powered by Disqus