Bayes Statistics HW5

5-4.

5-4a

Yes. The joint distribution is

$\(p(\theta_1,..,\theta_{2J}) = {2J \choose J}(\prod_{j=1}^JN(\theta_{p(j)}|1,1)N(\theta_{p(j)}|-1,1))\)$,

where the sum is over all permutations p of (1,...,2J). The density above is obviously invariant permutations of the indexex (1,...,2J).

5-4b

The covariance of \(\theta_i, \theta_j\) is negative. If \(\theta_i\) is large, then it probably comes from the N(1,1) distribution, which means that it is more likely than not that \(\theta_j\) from the N(-1,1) distribution, which means that \(\theta_j\) will probably be negative. Conversely, if \(\theta_i\) is negative, \(\theta_j\) is most likely positive.

Then, \((\theta_1,..,\theta_{2J})\) can't be written as a mixture form.

5-4c

As \(J \rightarrow \infty\), the negative \(Cor(\theta_i,\theta_j)\) approaches zero, and the joint distribution approaches iid. To put it another way, the distinction disappears between independently assigning each \(\theta_j\) to one of two groups, and picking exactly half of the \(\theta_j\)'s each group.

5-5

Let \(\mu(\phi) =E(\theta_j|\phi)\).

\[cov(\theta_i,\theta_j) = E(cov(\theta_i,\theta_j|\phi))+cov(E(\theta_i|\phi),E(\theta_j|\phi))\]
\[0+cov(\mu(\phi)),\mu(\phi))\]
\[=var(\mu(\phi)) \geq 0.\]

5-7

5-7(a)

From formula (2.7), \(E(y) = E(E(y|\theta)) = \theta = \alpha/\beta\). Because \(y|\theta \sim Pois(\theta), E(y|\theta) = \theta\).

From formula (2.8),

$\(var(y)=E(var(y|\theta))+var(E(y|\theta))=E(\theta)+var(\theta)=\alpha/\beta+\alpha/\beta^2 = \alpha/\beta^2(\beta+1),\ var(y|\theta)=\theta\)$.

5-7(b)

To solve this problems, \(n,s, \bar{y}\) are essentially treated like constants.

From (2.7) and (3.3),

\[E(\sqrt{n}(\mu-\bar{y})/s|y)=E(E(\sqrt{n}(\mu-\bar{y})/s|\sigma,y)|y) =E((\sqrt{n}/s)E(\mu-\bar{y}|\sigma,y|y)=E((\sqrt{n}/s\cdot0|y) =E(0|y)=0.\]

We must have \(n > 1\) for \(s\) to be defined, but in order for the expectation to exist, we must have \(n >2\).

We can compute from (2.8), (3.3), and (3.5) that

\[var(\sqrt{n}(\mu-\bar{y}/s|y)=var(E(\sqrt{n}(\mu-\bar{y}/s|\sigma,y|y)+E(var(\sqrt{n}(\mu-\bar{y})/s|\sigma,y)|y)\]
\[=var(0|y)+E((n/s^2)var(\mu|\sigma,y)|y)\]
\[E((n/s^2)\sigma^2/n|y)\]

$\(E(\sigma^2|y)/s^2 =\frac{n-1}{n-3}\)$.

For this to work, we need \(n > 3\).

5-8

Let \(p_m(\theta|y)\) denote the posterior density of \(\theta\) corresponding to the prior density \(p_m(\theta)\).If \(p(\theta) = \sum_n\lambda_mp_m(\theta),\) the posterior density of \(\theta\) is proportional to \(\sum_m\lambda_mp_m(\theta)p(y|\theta)=\sum_m\lambda_mp_m(y)p_m(\theta|y):\) mixture of the posterior densities \(p_m(\theta|y)\) with weights proportional to \(\lambda_mp_m(y)\).

Since each \(p_m(\theta)\) is conjugate for the model for y given \(\theta\), the preceding computation demonstrates that the class of finite mixture prior densities is also conjugate.

Consider an example : \(p_1(\theta) \sim N(1,0.5^2), p_2(\theta) \sim N(-1,0.5^2)\), and suppose \(\lambda_1=0.9, \lambda_2=0.1\). We know that \(p_1(\theta|y) \sim N(1.5/14,1/14)\) and \(p_2(\theta|y) \sim N(-6.5/14,1/14)\). We also know \(p(\theta|y)\) will be a weighted sum of these conditional posterior densities with weights \(\lambda_mp_m(y)/\sum_k\lambda_kp_k(y)\) for \(m=1,2\).

\(p_1(y) = N(-0.25|1,0.5^2+1/10)=0.072\), and \(p_2(y) = N(-0.25|-1,0.5^2+1/10)=0.302\).

So the weights for \(p_1(\theta|y)\) and \(p_2(y)\) are not 0.9 and 0.1 but are, \(\frac{0.9\cdot0.072}{0.9\cdot0.072+0.1\cdot0.302}=0.68\) and \(\frac{0.1\cdot0.302}{0.9\cdot0.072+0.1\cdot0.302}=0.32\).

theta <- seq(-3,3,.01)
prior <- c (0.9, 0.1)
dens <- prior[1]*dnorm(theta,1,0.5) +
prior[2]*dnorm(theta,-1,0.5)
plot (theta, dens, ylim=c(0,1.1*max(dens)),
type="l", xlab="theta", ylab="", xaxs="i",
yaxs="i", yaxt="n", bty="n", cex=2)
mtext ("prior density", cex=2, 3)

marg <- dnorm(-.25,c(1,-1),sqrt(c(0.5,0.5)^2+1/10))
posterior <- prior*marg/sum(prior*marg)
dens <- posterior[1]*dnorm(theta,1.5/14,sqrt(1/14)) +
posterior[2]*dnorm(theta,-6.5/14,sqrt(1/14))
plot (theta, dens, ylim=c(0,1.1*max(dens)),
type="l", xlab="theta", ylab="", xaxs="i",
yaxs="i", yaxt="n", bty="n", cex=2)
mtext ("posterior density", cex=2, 3)

5-9

5-9a

Consider the limit \((\alpha+\beta) \rightarrow \infty\) with \(\alpha/\beta\) fixed at any nonzero value. The likelihood is

\[p(y|\alpha,\beta) \propto \prod_{j=1}^J\frac{[\alpha\cdots(\alpha+y_i-1)][\beta\cdot n_j-y_j-1)]}{(\alpha+\beta)\cdots(\alpha+\beta+n_j-1)}\]
\[\approx \prod_{j=1}^J\frac{\alpha^{y_j}\beta^{n_j-y_j}}{(\alpha+\beta)^n_j}\]

$\(=\prod_{j=1}^J(\frac{\alpha}{\alpha+\beta})^y_j(\frac{\beta}{\alpha+\beta})^{n_j-y_j}\)$,

which is constants, so the pior density determindes whether the posterior density has a finite integral in this limit. A uniform prior density on \(log(\alpha+\beta)\) has an infinite integral in this limit, and so the posterior density does also in this case.

5-9b

\[\begin{equation*} \begin{vmatrix} \frac{\beta}{(\alpha+\beta)^2} & -\frac{\alpha}{(\alpha+\beta)^2} \\ -\frac{1}{2}(\alpha+\beta)^{-3/2} & -\frac{1}{2}(\alpha+\beta)^{-3/2} \\ \end{vmatrix} \end{equation*} = constant \cdot (\alpha+\beta)^{-5/2}\]

5-12

Applying formula (2.7) and (2.8),

\[E(\theta_j|\tau,y) = E[E(\theta_j|\mu,\tau,y)|\tau,y] = E[\frac{\frac{1}{\sigma_j^2}y_j+\frac{1}{\tau^2}\mu}{\frac{1}{\sigma_j^2}+\frac{1}{\tau^2}}|\tau,y]=\frac{\frac{1}{\sigma_j^2}y_j+\frac{1}{\tau^2}\hat\mu}{\frac{1}{\sigma_j^2}+\frac{1}{\tau^2}}\]

$\(var(\theta_j|\tau,y) = E[var(\theta_j|\mu,\tau,y)|\tau,y]+var[E(\theta_j|\mu,\tau,y)|\tau,y]=\frac{1}{\frac{1}{\sigma^2}+\frac{1}{\tau^2}}+(\frac{\frac{1}{\tau^2}}{\frac{1}{\sigma^2}+\frac{1}{\tau^2}})^2V_\mu\)$, where expressions for \(\hat\mu=E(\mu|\tau,y)\) and \(V_\mu=var(\mu|\tau,y)\) are given in (5.20).

Ex: Hierarchical model for Rats experiment

library(ggplot2)
theme_set(theme_minimal())
library(gridExtra)
library(tidyr)
library(latex2exp)
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,
        2,1,5,2,5,3,2,7,7,3,3,2,9,10,4,4,4,4,4,4,4,10,4,4,4,5,11,12,
        5,5,6,5,6,6,6,6,16,15,15,9,4)
n <- c(20,20,20,20,20,20,20,19,19,19,19,18,18,17,20,20,20,20,19,19,18,18,25,24,
       23,20,20,20,20,20,20,10,49,19,46,27,17,49,47,20,20,13,48,50,20,20,20,20,
       20,20,20,48,19,19,19,22,46,49,20,20,23,19,22,20,20,20,52,46,47,24,14)
x <- seq(0.0001, 0.9999, length.out = 1000)
bdens <- function(n, y, x)
    dbeta(x, y+1, n-y+1)
df_sep <- mapply(bdens, n, y, MoreArgs = list(x = x)) %>%
  as.data.frame() %>% cbind(x) %>% gather(ind, p, -x)
labs1 <- paste('posterior of', c('theta_j', 'theta_71'))
plot_sep <- ggplot(data = df_sep) +
  geom_line(aes(x = x, y = p, color = (ind=='V71'), group = ind)) +
  labs(x = expression(theta), y = '', title = 'Separate model', color = '') +
  scale_y_continuous(breaks = NULL) +
  scale_color_manual(values = c('blue','red'), labels = labs1) +
  theme(legend.background = element_blank(), legend.position = c(0.8,0.9))
# The last one is for emphasize colored red
plot_sep

df_pool <- data.frame(x = x, p = dbeta(x, sum(y)+1, sum(n)-sum(y)+1))
plot_pool <- ggplot(data = df_pool) +
  geom_line(aes(x = x, y = p, color = '1')) +
  labs(x = expression(theta), y = '', title = 'Pooled model', color = '') +
  scale_y_continuous(breaks = NULL) +
  scale_color_manual(values = 'red', labels = 'Posterior of common theta') +
  theme(legend.background = element_blank(), legend.position = c(0.7,0.9))
grid.arrange(plot_sep, plot_pool)

A <- seq(0.5, 6, length.out = 100)
B <- seq(3, 33, length.out = 100)
cA <- rep(A, each = length(B))
cB <- rep(B, length(A))
lpfun <- function(a, b, y, n) log(a+b)*(-5/2) +
  sum(lgamma(a+b)-lgamma(a)-lgamma(b)+lgamma(a+y)+lgamma(b+n-y)-lgamma(a+b+n))
lp <- mapply(lpfun, cA, cB, MoreArgs = list(y, n))
df_marg <- data.frame(x = cA, y = cB, p = exp(lp - max(lp)))
title1 <- TeX('The marginal of $\\alpha$ and $\\beta$')
ggplot(data = df_marg, aes(x = x, y = y)) +
  geom_raster(aes(fill = p, alpha = p), interpolate = T) +
  geom_contour(aes(z = p), colour = 'black', size = 0.2) +
  coord_cartesian(xlim = c(1,5), ylim = c(4, 26)) +
  labs(x = TeX('$\\alpha$'), y = TeX('$\\beta$'), title = title1) +
  scale_fill_gradient(low = 'yellow', high = 'red', guide = F) +
  scale_alpha(range = c(0, 1), guide = F)

nsamp <- 100
samp_indices <- sample(length(df_marg$p), size = nsamp,
                       replace = T, prob = df_marg$p/sum(df_marg$p))
samp_A <- cA[samp_indices[1:nsamp]]
samp_B <- cB[samp_indices[1:nsamp]]
df_psamp <- mapply(function(a, b, x) dbeta(x, a, b),
                  samp_A, samp_B, MoreArgs = list(x = x)) %>%
  as.data.frame() %>% cbind(x) %>% gather(ind, p, -x)
indtonum <- function(x) strtoi(substring(x,2))
title2 <- TeX('Beta($\\alpha,\\beta$) given posterior draws of $\\alpha$ and $\\beta$')
plot_psamp <- ggplot(data = subset(df_psamp, indtonum(ind) <= 20)) +
  geom_line(aes(x = x, y = p, group = ind), color='forestgreen') +
  labs(x = expression(theta), y = '', title = title2) +
  scale_y_continuous(breaks = NULL)
df_psampmean <- spread(df_psamp, ind, p) %>% subset(select = -x) %>%
    rowMeans() %>% data.frame(x = x, p = .)
title3 <- TeX('Population distribution (prior) for $\\theta_j$')
plot_psampmean <- ggplot(data = df_psampmean) +
  geom_line(aes(x = x, y = p), color='forestgreen') +
  labs(x = expression(theta), y = '', title = title3) +
  scale_y_continuous(breaks = NULL)
grid.arrange(plot_psamp, plot_psampmean)

plot_sep7 <- ggplot(data = subset(df_sep, indtonum(ind)%%7==0)) +
  geom_line(aes(x = x, y = p, color = (ind=='V49'), group = ind)) +
  labs(x = expression(theta), y = '', title = 'Separate model', color = '') +
  scale_y_continuous(breaks = NULL) +
  scale_color_manual(values = c('blue', 'red'), guide = F) +
  theme(legend.background = element_blank(), legend.position = c(0.8,0.9))
bdens2 <- function(n, y, a, b, x)
  rowMeans(mapply(dbeta, a + y, n - y + b, MoreArgs = list(x = x)))
df_hier <- mapply(bdens2, n, y, MoreArgs = list(samp_A, samp_B, x)) %>%
  as.data.frame() %>% cbind(x) %>% gather(ind, p, -x)
plot_hier7 <- ggplot(data = subset(df_hier, indtonum(ind)%%7==0)) +
  geom_line(aes(x = x, y = p, color = (ind=='V49'), group = ind)) +
  labs(x = expression(theta), y = '', title = 'Hierarchical model', color = '') +
  scale_color_manual(values = c('blue', 'red'), guide = F) +
  scale_y_continuous(breaks = NULL) +
  theme(legend.background = element_blank(), legend.position = c(0.8,0.9))
grid.arrange(plot_sep7, plot_hier7)

qq_separate<-data.frame(id=1:length(n), n=n, y=y,
               q10=qbeta(0.1,y+1,n-y+1), q90=qbeta(0.9,y+1,n-y+1))
qh <- function(q, n, y) colMeans(mapply(function(q, n, y, a, b)
    mapply(qbeta, q, a + y, n - y + b), q, n, y, MoreArgs = list(samp_A, samp_B)))
qq_hier <- data.frame(id=1:length(n), n=n, y=y,
                      qh(0.05, n, y), qh(0.95, n, y))
ggplot(data=qq_separate[seq(1,length(n),by=3),], aes(x=jitter(n,amount=1),ymin=q10,ymax=q90)) + geom_linerange() + xlim(c(0,60))