This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right hand corner of the page

Chapter 12 A Brief introduction to MCMC sampling and JAGS

Learning objectives

  1. Gain insights into how Markov chain Monte Carol (MCMC) sampling works.

  2. Be able to implement your first Bayesian model using Just Another Gibbs Sampler (JAGS) software.

12.1 R packages

We begin by loading a few packages upfront:

library(dplyr) # for data wrangling
library(knitr) # for options when knitting
library(ggplot2) # for plotting
theme_set(theme_bw()) # black and white background
library(ggthemes) # for colorblind palette

In addition, we will use data and functions from the following packages:

12.2 Introduction to MCMC sampling

In our moose detection example from Section 11, we could determine the posterior distribution analytically. Usually, we will not be so lucky. In particular, in most cases, there will be no closed form solution to:

\[\begin{equation} p(\theta | y) = \frac{L(y | \theta)\pi(\theta)}{\int_{-\infty}^{\infty}L(y | \theta)\pi(\theta)} \end{equation}\]

Instead, we will generate samples that we can use to summarize the posterior distribution, \(p(\theta | y)\). There are many ways to generate a sample from the posterior distribution. In this section, we will briefly consider one possible way to generate a Markov chain (an autocorelated sequence of parameter values) that converges in distribution to \(p(\theta|y)\). This approach, called Metropolis, is one of many possible Markov Chain Monte Carlo (MCMC) approaches used to generate samples from a posterior distribution.

12.3 Metropolis algorithm

MCMC approaches generate a sequence of parameter values using a set of rules that allow the sampler to explore the full range of support of the posterior distribution, while spending most of the time in areas where the posterior distribution is at its highest point (Figure 12.1).

Depiction of how the Metropolis  Algorithm works. Figure created by Evan Cooch, Cornell University.

FIGURE 12.1: Depiction of how the Metropolis Algorithm works. Figure created by Evan Cooch, Cornell University.

Remember, the posterior distribution is given by:

\[\begin{equation} p(\theta | y) = \frac{L(y | \theta)\pi(\theta)}{\int_{-\infty}^{\infty}L(y | \theta)\pi(\theta)} \end{equation}\]

Consider two possible values of \(\theta\) = {\(\theta_1\) and \(\theta_2\)}. Without the denominator, we cannot evaluate \(p(\theta_1 | y)\) or \(p(\theta_2 | y)\). We can, however, evaluate the relative likelihood, \(R\) of \(\theta_1\) and \(\theta_2\) since the denominator will cancel out:

\[\begin{equation} R = \frac{p(\theta_2 | y)}{p(\theta_1 | y)} = \frac{L(y | \theta_2)\pi(\theta_2)}{L(y | \theta_1)\pi(\theta_1)} \end{equation}\]

The Metropolis algorithm uses this relative likelihood to determine a sequence of parameter values using the following algorithm:

  1. Initiate the Markov chain with an initial starting value, \(\theta_0\).

  2. Generate a new, proposed value of \(\theta\) from a symmetric distribution centered on \(\theta_0\) (e.g., \(\theta^{\prime}\) = rnorm(mean = \(\theta_0, sd = \sigma\))).

  3. Decide whether to accept or reject \(\theta^{\prime}\):

    • If \(R = \frac{p(\theta^{\prime} | y)}{p(\theta_0 | y)} > 1\), we move up the probability hill in Figure 12.1 and accept the proposed value, setting \(\theta_1 = \theta^{\prime}\).

    • If \(R <1\), the proposal moves us down the probability hill. In this case, we accept \(\theta^{\prime}\) with probability = \(R\). To do so, we generate a uniform random number, \(u\), between 0 and 1. If \(u \le R\), then we accept the new value, setting \(\theta_1=\theta^{\prime}\). Otherwise, we reject the new value and keep \(\theta_1\) at \(\theta_0\).

  4. Return to step 2, replacing \(\theta_0\) with \(\theta_1\).

Note that step 3 ensures that the proposed parameter values that move us up the probability hill (towards higher values of the posterior distribution) are always accepted, but also that we do not get stuck at the top of the hill (i.e., we continue to sample from the full range of parameters supported by the posterior distribution). This strategy is not too difficult to program - if you want to see what R code would look like for a Metropolis sampler, see: https://theoreticalecology.wordpress.com/2010/09/17/metropolis-hastings-mcmc-in-r/.

We will continue to sample until:

  1. The distribution of \(\theta_1, \theta_2, \ldots, \theta_M\) appears to have reached a steady state (i.e., the chain has converged to an equilibrium distribution). In other words, if you were to continue to generate another \(M\) samples and then look at their distribution, it should be similar to the distribution of the first \(M\) samples.

  2. The MCMC sample, \(\theta_1, \theta_2, \ldots, \theta_M\), is large enough that we can summarize characteristics of the posterior distribution, \(p(\theta | data)\) (e.g., its mean, median, 2.5th and 97.5th quantiles) to our desired level of precision.

An example Markov Chain may look something like Figure 12.2. This plot is called a traceplot, and it depicts the series of parameter values generated using MCMC sampling for 4 different parameters (represented in the different panels of the plot). In this case, two different Markov chains (shown in red and blue) were initiated at different starting values. Eventually, the two chains converge in distribution as evidenced by the red and blue plots largely overlapping after the first set of 500-1000 samples. Usually, we will discard the initial parameter values where the chains are clearly sampling different regions of the parameter space, which we label as a “burn-in” period. Once we have these samples, we can estimate \(\theta\) by the mean (or median) of the samples, and we can compute credible intervals using quantiles of the sampled values.

Traceplot depicting the series of parameter values generated using MCMC sampling. Two different Markov chains are shown in red and blue. These chains were initiated at different starting values, but converge in distribution after an initial burn in period.

FIGURE 12.2: Traceplot depicting the series of parameter values generated using MCMC sampling. Two different Markov chains are shown in red and blue. These chains were initiated at different starting values, but converge in distribution after an initial burn in period.

How do we know if we have sampled long enough - i.e., that our sample has converged in distribution to \(p(\theta|y)\)? The short answer is that there is no foolproof method for detecting convergence. Some things that we can and will do, however, include:

  • Run multiple chains (starting in different places) and see if they converge on similar distributions
  • Discard the first \(n_{burnin}\) iterations (where the sampler has not yet converged)
  • Consider the Gelman-Rubin Statistic (Gelman, Rubin, et al., 1992), Rhat, which is output by JAGS. Rhat compares the variance of parameter values between chains to the variance of parameter values within chains. Values near 1 suggest likely convergence. As a general rule, Rhat should be less than 1.1.

12.4 Aside: Sampler performance

With Metropolis or Metropolis-Hastings (the latter allows for non-symmetric proposal distributions), we need to consider how to generate good proposals for our parameters (i.e., “candidate” parameter values during step 2 of the process). This may require setting one or more “tuning” parameters to ensure the sampler does not get stuck at the top of the hill or jump too far away from the current value that we miss the hill altogether. For example, when using the normal distribution, we can try different values of \(\sigma\) and pick a value that ensures steps are big enough that we efficiently sample all areas of the posterior distribution without being so big that we completely “fall off the cliff” when we are near the top of the hill (Figure 12.1).

In this book, we will be using JAGS (Plummer et al., 2003) to implement Bayesian models. There are a number of packages available to run JAGS from within R (e.g., rjags , runjags, R2jags, rube, etc); we will generally use R2Jags (Y.-S. Su & Masanao Yajima, 2021). JAGS has a variety of different MCMC algorithms at its disposal, and it will attempt to determine the best algorithm depending on the likelihood and set of prior distributions (one for each parameter) that we give it. Although JAGS is quite powerful, there are other samplers developed in recent years that are not available in JAGS but are more efficient (i.e., they converge in distribution quicker and their samples are less autocorrelated). For example, STAN (Carpenter et al., 2017), a popular alternative to JAGS, employs a “No-U-Turn” (NUTS) sampler that is typically more efficient at sampling the posterior distribution (Hoffman & Gelman, 2014); for a visual demonstration of different samplers, see http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/. STAN can also be run from R (Stan Development Team, 2019), but the syntax is a little more challenging to learn.

12.5 Specifying a model in JAGS

To run a model using JAGS, we will need to:

  1. Specify prior distributions for all model parameters.
  2. Specify the likelihood of the data.
  3. Fit the model by calling JAGS from R to generate samples using its built in MCMC routines.
  4. Evaluate whether or not we think the samples have converged in distribution to \(p(\theta | y)\).
  5. Use our samples to characterize the posterior distribution, \(p(\theta | y)\).

Let’s revisit our linear model for comparing the mean jaw lengths of male and female golden jackals from Section 3.6.1. The data are again provided, below:

males<-c(120, 107, 110, 116, 114, 111, 113, 117, 114, 112)
females<-c(110, 111, 107, 108, 110, 105, 107, 106, 111, 111)

As before, we will assume the mean jaw lengths for males and females are normally distributed, with sex-specific means, \(\mu_m\) and \(\mu_f\). We will also, for now, assume that the jaw lengths are equally variable for males and females (Exercise 11.1 will have you will relax this assumption). This leads to the following likelihood:

  • \(y_{males} \sim N(\mu_m, \sigma^2)\)
  • \(y_{females} \sim N(\mu_f, \sigma^2)\)

This model has 3 parameters: \(\mu_m, \mu_f,\) and \(\sigma\). We will need to specify prior distributions for each of these parameters. Before doing so, it is important to note that JAGS and WinBugs represent a normal distribution as \(N(\mu, \tau = 1/\sigma^2)\). \(\tau\) is called the precision parameter of the Normal distribution. Rather than specify a prior for the precision, it will be easier to specify a prior for \(\sigma\) (this will require thinking about the standard deviation rather than 1/variance). We will then use the prior for the \(\sigma\) to “induce” (i.e., determine) the prior for \(\tau\).

We want to choose priors that are “dispersed” (meaning they allow for a wide range of possible values). One way to do this is to use Normal distributions with large variance parameters (small precision parameters). Alternatively, we can use uniform distributions that allow for a wide range of values. We will use a mix of these strategies, below:

Priors:

  • \(\mu_m \sim N(100, 0.001)\)
  • \(\mu_f \sim N(100, 0.001)\)
  • \(\sigma \sim Uniform(0, 30)\)

JAGS code (see jaw.mod below) looks just like R code, but with some differences:

  • JAGS code is not executed (it just defines the model)
  • It does not matter how we order our code (we can define the prior before likelihood or the likelihood after prior - it will not make a difference).

There are 6 types of objects in JAGS:

  1. Modeled data specified using a \(\sim\) (which can be read as “distributed as”). For example, we will use y \(\sim\) followed by a probability distribution to specify the distribution of our response variable, \(y\), in our regression model.

  2. Unmodeled data: these are objects that are not assigned probability distributions. Examples of unmodeled data include predictors, constants, and index variables used in loops.

  3. Fixed-effects parameters: these are parameters that are assigned uninformative priors.

  4. Random-effect parameters: these are parameters associated with different groups that are linked together by a common prior distribution. This common prior distribution will include additional parameters called hyperparameters that are also assigned prior distributions sometimes referred to as hyperpriors. We will not see these until later in the course (Chapter 18).

  5. Derived quantities: these are objects determined using the assignment arrow, <-. For example, we might be interested in obtaining samples for a function of parameters (e.g., \(E[Y_i|Age_i]=L_{\infty}(1-e^{-k(Age_i-t_0)})\) in our bear growth example from Section 10.7). We can accomplish this goal by defining \(E[Y_i|Age_i]\) as a derived quantity, depending on unmodeled data (Age) and modeled parameters (\(L_{\infty}, k, t_0\)).

  6. Looping indicies: \(i, j\), used in loops (e.g., over our observations in a data set).

Below, we define our model for the jaw data using a function (by contrast, Kéry (2010) and others often write the model out to a file).

jaw.mod<-function(){
  
  # Priors 
    mu.male ~ dnorm(100, 0.001) # mean of male jaw lengths
    mu.female ~ dnorm(100, 0.001) # mean of female jaw lengths
    sigma ~ dunif(0, 30) # common sigma
    tau <- 1/(sigma*sigma) #precision

  # Likelihood (Y | mu.male, mu.female, sigma) = Normal(mu[sex], sigma^2)  
    for(i in 1:nmales){
      males[i] ~ dnorm(mu.male, tau) 
    }
    for(i in 1:nfemales){
      females[i] ~ dnorm(mu.female, tau)
    }
  
  # Derived quantities:  difference in means
    mu.diff <- mu.male - mu.female
}

Here, we can identify the following types of objects in our model:

  1. Modeled data = males, females(holding our jaw lengths)

  2. Unmodeled data = nmales, nfemales

  3. Fixed-effects parameters = mu.male, mu.female, sigma

  4. Random-effects parameters (none in this example)

  5. Derived quantities = tau, mu.diff

  6. Looping indexes: i (used twice)

12.6 Fitting a model using JAGS

We begin by loading the R2jags package as well as the mcmcplots (Curtis, 2018), MCMCvis (Youngflesh, 2018), and ggplot2 (Wickham, 2016) packages to help with visualizing the results.

library(R2jags)
library(mcmcplots)
library(MCMCvis) 

To begin sampling, JAGS will need a set of initial values (our \(\theta_0\)’s, for example, in step 1 of the Metropolis algorithm). Usually, JAGS can use the prior distributions to generate initial values. This will happen by default unless you supply a set of initial values. Alternatively, you can supply a function to generate initial values (this is the approach Kéry (2010) takes in his book). I tend to be lazy and allow JAGS to generate its own initial values, but this can sometimes get you in trouble (we will see this in a later section).

The function, init.vals, below can be used to generate initial values:

# Function to generate initial values
init.vals<-function(){
    mu.male <- rnorm(1, 100, 100)
    mu.female <- rnorm(1, 100, 100)
    sigma <- runif(1, 0, 10) 
    out <- list(mu.male = mu.male, mu.female = mu.female, sigma = sigma)
}

We will also create objects to hold some of the unmodeled data that we will pass to JAGS, namely the number of males and females in the data set.

#' Create rest of the data for the model
nmales<-length(males)
nfemales<-length(females)

We then use the jags function to call JAGS and generate our MCMC sample:

t.test.jags <- jags(data=c("males", "females", "nmales",  "nfemales"),
                      inits = init.vals,
                      parameters.to.save = c("mu.male", "mu.female", "sigma", "mu.diff"), 
                      progress.bar = "none",
                      n.iter = 10000,
                      n.burnin = 5000,
                      model.file = jaw.mod,
                      n.thin = 1,
                      n.chains = 3)

Note the following arguments:

  • data will contain all modeled and unmodeled data (here, males and females containing the jaw lengths and nmales and nfemales containing the number of males and females).

  • parameters.to.save is a list of parameters for which we want to save the MCMC samples. Here, we specify that we want to keep track of \(\mu_m, \mu_f, \sigma\), and \(\mu_m -\mu_f\). By contrast, we do not save \(\tau\) since we do not intend to examine the posterior distribution of the precision parameter.

  • progress.bar = "none" (for Windows users, you can also try progress.bar = gui). If we don’t supply this argument, you will get a lot of output in your html file when using Rmarkdown.

  • n.iter = 10000 specifies the total number of samples we want to generate

  • n.burnin = 5000 specifies that we want to throw away the first 5000 samples

  • model.file = jaw.mod specifies the function containing the model specification

  • n.thin = 1 specifies that we want to keep all of the samples. We can save memory by saving say every other sample if we change this to n.thin = 2. If the chains are highly autocorrelated, we won’t loose much information by keeping every other sample.

  • n.chains = 3 specifies that we want to generate 3 Markov chains, each generated with a different set of starting values.

Alternatively, we could use jags.parallel to take advantage of parallel processing using:

t.test.jags <- jags.parallel(data = c("males", "females", "nmales",  "nfemales"),  
                             inits = init.vals,
                             parameters.to.save = c("mu.male", "mu.female", "sigma", "mu.diff"),
                             n.iter = 10000,
                             n.burnin = 5000,
                             model.file = jaw.mod,
                             n.thin = 1,
                             n.chains = 3)

Let’s inspect the output from jags by typing t.test.jags (the name of the object we created).

t.test.jags
## Inference for Bugs model at "jaw.mod", fit using jags,
##  3 chains, each with 10000 iterations (first 5000 discarded)
##  n.sims = 15000 iterations saved
##           mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
## mu.diff     4.791   1.497   1.862   3.809   4.795   5.769   7.735 1.001 15000
## mu.female 108.586   1.068 106.492 107.887 108.579 109.272 110.699 1.001 15000
## mu.male   113.376   1.056 111.303 112.680 113.380 114.064 115.503 1.001  9900
## sigma       3.314   0.593   2.389   2.897   3.240   3.648   4.683 1.001 15000
## deviance  103.054   2.733  99.903 101.055 102.374 104.344 110.165 1.001 15000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 3.7 and DIC = 106.8
## DIC is an estimate of expected predictive error (lower deviance is better).

We see that we get a summary of the posterior distribution for each saved parameter:

  • mu.vect = the mean of the posterior distribution
  • sd.vect = the standard deviation of the posterior distribution
  • 2.5% to 97.5% = quantiles of posterior distribution
  • Rhat is the Gelman-Rubin statistic for evaluating convergence of the MCMC samples.
  • n.eff = an estimate of the effective sample size. As we will see in a bit, our MCMC samples will typically be autocorrelated. Thus, they will contain less information than if we were able to somehow generate a set of independent samples from the posterior distribution.

JAGS also returns an estimate of DIC (an information theoretic quantity like AIC that we will briefly discuss in Section 15.8.4).

If we only wanted to view the output for a few select parameters, we can use the MCMCsummary function in the MCMCvis package (Youngflesh, 2018):

MCMCsummary(t.test.jags, params = c("mu.male", "mu.female"))
##               mean       sd     2.5%      50%    97.5% Rhat n.eff
## mu.male   113.3763 1.056445 111.3029 113.3805 115.5031    1 15206
## mu.female 108.5855 1.068213 106.4917 108.5786 110.6986    1 15237

Before interpreting the output, we should verify that we have sampled long enough that our samples have converged in distribution. Inspecting the values of Rhat, we see that they are all close to 1. It is also a good idea to inspect posterior density plots and trace plots to provide further reassurance that the different chains have ended up in the same spot (see Section 12.7).

We can use the MCMCpstr function in the MCMCvis package to pull off the posterior means to serve as our point estimates and to pull off quantiles of the posterior distribution to form 95% credible intervals.

# Use MCMCsummary to pull off posterior means
bayesests <- MCMCpstr(t.test.jags, params = c("mu.male", "mu.female"), func = mean)
bayesests
## $mu.male
## [1] 113.3763
## 
## $mu.female
## [1] 108.5855
# Use MCMCsummary to pull of upper and lower 95% credible interval limits
bayesci <-  MCMCpstr(t.test.jags, params = c("mu.male", "mu.female"), 
                        func = function(x) quantile(x, probs = c(0.025, 0.975)))
bayesci
## $mu.male
##             2.5%    97.5%
## mu.male 111.3029 115.5031
## 
## $mu.female
##               2.5%    97.5%
## mu.female 106.4917 110.6986

We then fit the same model in a frequentist framework using lm and create a data set with our estimates and credible/confidence intervals.

# Fit linear model in frequentist framework
jawdat<-data.frame(jaws=c(males, females), 
                    sex=c(rep("Male", length(males)), 
                          rep("Female", length(females))))
lm.jaw<-lm(jaws~sex-1, data=jawdat)
# Store results
betaf <- coef(lm.jaw)
CIf <-confint(lm.jaw) 
ests<-data.frame(estimate = c(bayesests$mu.female, bayesests$mu.male, betaf), 
                 LCL = c(bayesci$mu.female[1],   bayesci$mu.male[1], CIf[,1]), 
                 UCL = c(bayesci$mu.female[2],   bayesci$mu.male[2], CIf[,2]), 
                 param = c("Mean females", "Mean males"),
                 Method = rep(c("Bayesian", "Frequentist"), each = 2))

Lastly, we plot the results, showing that they are nearly identical (Figure 12.3)!

ggplot(ests, aes(param,estimate, col = Method)) + 
  geom_point(position = position_dodge(width = 0.2))+ 
  geom_pointrange(aes(ymin = LCL, ymax= UCL), position = position_dodge(width = 0.2))+
  ylab("Estimate") + xlab("") +
  scale_x_discrete(labels = c('Mean females' = expression(mu[f]),
                              'Mean males'   = expression(mu[m]))) + 
  scale_colour_colorblind()+
  theme(text = element_text(size = 20))  
Comparison of frequentist and Bayesian estimates and confidence/credible intervals for the mean jaw length of male and female jackals.

FIGURE 12.3: Comparison of frequentist and Bayesian estimates and confidence/credible intervals for the mean jaw length of male and female jackals.

12.7 Density and traceplots for assessing convergence

When running code in Rstudio, I often like to use the mcmcplot function to visualize output. This function will create a separate html file that you can use to visualize posterior distributions, autocorrelation functions, and traceplots for the different parameters. This function does not work well with the bookdown package Xie (2022) used to create this book, however, so we will explore these different plots separately.

We can visualize posterior densities using denplot:

denplot(t.test.jags, ask = FALSE)  
Posterior density plots for each parameter in our model fit to the jackal jaw length data set. Colors in the different panels correspond to different chains.

FIGURE 12.4: Posterior density plots for each parameter in our model fit to the jackal jaw length data set. Colors in the different panels correspond to different chains.

In each panel, we see density plots for each chain associated with a particular parameter as well as density plots for the deviance (we can also pool results across the different chains if we add collapse = TRUE). For this simple model, it is clear that all three chains ended up in the same place, providing further evidence that we have reached convergence. If we had a large number of parameters, then we could have added the parms argument to provide a list containing a subset of parameters to visualize. For example, we could visualize the posteriors for \(\mu_m\) and \(\mu_f\) using denplot(t.test.jags, parms = c("mu.male", "mu.female"))55.

We can also look at traceplots showing the sequence of posterior samples using:

traplot(t.test.jags, ask=FALSE)
Trace plots for each parameter in our model fit to the jackal jaw length data set. Colors in the different panels correspond to different chains.

FIGURE 12.5: Trace plots for each parameter in our model fit to the jackal jaw length data set. Colors in the different panels correspond to different chains.

Again, the fact that the different chains largely overlap is reassuring. Sometimes, traceplots will not look so noisy - but, rather you will see a slow moving trend over time. When this happens, the samples are highly autocorrelated, and we will have to sample for a long time in order to explore the full range of the posterior distribution. By contrast, these traceplots suggest that the sample is doing a great job quickly exploring the full range of the posterior distribution, and we say the chains are “well mixed”.

We can look to see if our parameters are correlated with each other using parcorplot.

parcorplot(t.test.jags)
Correlation plot formed using the parcorplot function for inspecting whether the posterior distributions are correlated for different parameters (Figure @ref(fig:(ref:corplotbayes)).

FIGURE 12.6: Correlation plot formed using the parcorplot function for inspecting whether the posterior distributions are correlated for different parameters (Figure @ref(fig:(ref:corplotbayes)).

Lastly, we can plot estimates with 68% and 95% credible intervals using the caterplot function (Figure 12.7)56.

  caterplot(t.test.jags, ask = FALSE, 
            parms = c("mu.male", "mu.female"))  
Visualization of 68% and 95% credible intervals using the caterplot function applied to the Bayesian model fit to the jaw length data.

FIGURE 12.7: Visualization of 68% and 95% credible intervals using the caterplot function applied to the Bayesian model fit to the jaw length data.

Congrats - you have just fit your first JAGS model and learned how to work with and summarize samples from the posterior distribution!

12.8 Tips on running models in JAGS

Kéry (2010) offers lots of good tricks and tips in the Appendix of his book. First and foremost: start simple, then build up towards more complex models! If you have his book, we encourage you to look at his tips numbered: 2, 3, 4, 9, 11, 12 (use %T% in JAGS), 14, 16, 17, 20, 23,24, 25, 26, 27.

Also, as always, googling error messages can be useful for diagnosing problems, and it can help to work together with your peers when first learning to program in JAGS. It is extremely rare to specify everything correctly the first time when fitting a model using JAGS. That is OK. It is important to have failures followed by successes to give future you the confidence needed to stick with it!

References

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., … Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1).
Curtis, S. M. (2018). Mcmcplots: Create plots from MCMC output. Retrieved from https://CRAN.R-project.org/package=mcmcplots
Gelman, A., Rubin, D. B., et al. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.
Hoffman, M. D., & Gelman, A. (2014). The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15(1), 1593–1623.
Kéry, M. (2010). Introduction to WinBUGS for ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses. Academic Press.
Plummer, M. et al. (2003). JAGS: A program for analysis of bayesian graphical models using gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, pp. 1–10). Vienna, Austria.
Stan Development Team. (2019). RStan: The R interface to Stan. Retrieved from http://mc-stan.org/
Su, Y.-S., & Masanao Yajima. (2021). R2jags: Using r to run ’JAGS’. Retrieved from https://CRAN.R-project.org/package=R2jags
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.org
Xie, Y. (2022). Bookdown: Authoring books and technical documents with R markdown. Retrieved from https://github.com/rstudio/bookdown
Youngflesh, C. (2018). MCMCvis: Tools to visualize, manipulate, and summarize MCMC output. Journal of Open Source Software, 3(24), 640. doi:10.21105/joss.00640

  1. Note, the functions in the mcmcplots package include the argument parms, whereas the functions in the MCMCvis package include the argument params. It is easy to get these confused. If you use the tab key in Rstudio after typing “par”, you can have the function self-populate with the correct argument.↩︎

  2. Note, for a Normal distribution, a 68% interval is given by \(\pm 1\)SD.↩︎