Lewandowski-Kurowicka-Joe (LKJ) distribution


Story

Probability distribution for positive definite correlation matrices, or equivalently for their Cholesky factors.

To understand what a correlation matrix is, it helps to think first of a covariance matrix \(\mathsf{\Sigma}\), such as those used in the Multivariate Normal distribution. The covariance matrix may also be written as \(\mathsf{\Sigma} = \mathsf{S} \cdot \mathsf{C} \cdot \mathsf{S}\), where \(\mathsf{S} = \sqrt{\mathrm{diag}(\mathsf{\Sigma})}\), and entry \(i, j\) in the correlation matrix \(\mathsf{C}\) is \(C_{ij} = \sigma_{ij}/\sigma_i\sigma_j\). The diagonal of the correlation matrix is all ones, and the off-diagonal entries can range from \(-1\) to \(1\). Note also that because \(\mathsf{\Sigma}\) is symmetric and positive definite, so is \(\mathsf{C}\).


Parameters

There is one positive scalar parameter, \(\eta\), which tunes the strength of the correlations. If \(\eta=1\), the density is uniform over all correlation matrix. If \(\eta>1\), matrices with a stronger diagonal (and therefore smaller correlations) are favored. If \(\eta<1\), the diagonal is weak and correlations are favored.


Support

The LKJ distribution is supported over the set of \(K\times K\) Cholesky factors of real symmetric positive definite matrices.


Probability density function

The LKJ distribution is a distribution over a \(K\times K\) correlation matrix \(\mathsf{C}\). The normalization constant is a complicated expression involving Beta functions; the important feature of the PDF is that the probability density function is proportional to the determinant of the correlation matrix raised to the \(\eta - 1\) power.

\[\begin{align} f(\mathsf{C}|\eta) =\left(2^{\sum_{k=1}^{K-1}(2(\eta-1)+K-k)(K-k)}\prod_{k=1}^{K-1}\left(B(\eta+(K-k-1)/2,\eta + (K-k-1)/2)\right)^{K-k}\right) (\mathrm{det}\,\mathsf{C})^{\eta-1}. \end{align}\]

Here, \(B(\alpha, \beta)\) is a beta function.

The marginal distribution for each off-diagonal entry in \(\mathsf{C}\) is a Beta distribution with parameters \(\alpha = \beta = \eta - 1 + K/2\).


Cumulative distribution function

There is no analytic expression for the CDF.


Moments

Mean: Identity matrix of order \(K\). That is, each off-diagonal entry is zero, meaning that the average correlation between covariates is zero.

Variance of each entry off-diagonal entry: \(\displaystyle{\frac{4\left(\eta + \frac{K}{2} - 1\right)^2}{(2\eta + K - 2)^2 (2\eta + K -1)}}\)


Usage

The usage below assumes that mu is a length \(K\) array, Sigma is a \(K\times K\) symmetric positive definite matrix, and L is a \(K\times K\) lower-triangular matrix with strictly positive values on the diagonal that is a Cholesky factor.

Package

Syntax

NumPy

not available

SciPy

not available

Distributions.jl

LKJ(K, eta)

Distributions.jl Cholesky

LKJCholesky(K, eta, uplo='L')

Stan sampling

lkj_corr(eta)

Stan Cholesky sampling

lkj_corr_cholesky(eta)

Stan rng

lkj_corr(K, eta)

Stan Cholesky rng

lkj_corr_cholesky(K, eta)



Notes

  • The most common use case is as a prior for a covariance matrix. Note that LKJ distribution gives correlation matrices, not covariance matrices. We typically work with Cholesky factors. To get the covariance Cholesky factor from the correlation Cholesky factor, we need to multiply the correlation Cholesky factor by a diagonal matrix constructed from the variances of the individual variates. Here is an example using the LKJ distribution in a model with a multivariate Normal likelihood in Stan.

    parameters {
      // Vector of means
      vector[K] mu;
    
      // Cholesky factor for the correlation matrix
      cholesky_factor_corr[K] L_Omega;
    
      // Sqrt of variances for each variate
      vector<lower=0>[K] L_std;
    }
    
    
    transformed parameters {
      // Cholesky factor for covariance matrix
      matrix[K, K] L_Sigma = diag_pre_multiply(L_std, L_Omega);
    }
    
    
    model {
      // Prior on Cholesky decomposition of correlation matrix
      L_Omega ~ lkj_corr_cholesky(1);
    
      // Prior on standard deviations for each variate
      L_std ~ normal(0, 2.5);
    
      // Likelihood
      y ~ multi_normal_cholesky(mu, L_Sigma);
    }