Dirichlet distribution
Story
The Dirichlet distribution is a generalization of the Beta distribution. It is a probability distribution describing probabilities of outcomes. Instead of describing probability of one of two outcomes of a Bernoulli trial, like the Beta distribution does, it describes probability of \(K\) outcomes. The Beta distribution is the special case of the Dirichlet distribution with \(K=2\).
Parameters
The parameters are \(\alpha_1\), \(\alpha_2\), …, \(\alpha_K\), all strictly positive, defined analogously to \(\alpha\) and \(\beta\) of the Beta distribution.
Support
The Dirichlet distribution has support on the interval [0, 1] with the constraint that \(\sum_{i=1}^K \theta_i = 1\).
Probability density function
where \(B(\boldsymbol{\alpha})\) is the multivariate beta function.
Cumulative distribution function
There is no analytic expression for the CDF.
Moments
Mean of \(\theta_i\): \(\left<\theta_i\right> = \displaystyle{\frac{\alpha_i}{\sum_{i=k}^K \alpha_k}}\)
Variance of \(\theta_i\): \(\displaystyle{\frac{\left<\theta_i\right>(1-\left<\theta_i\right>)}{1 + \sum_{k=1}^K \alpha_k}}\)
Covariance of \(\theta_i, \theta_j\) with \(j\ne i\): \(\displaystyle{-\frac{\left<\theta_i\right>\left<\theta_j\right>}{1 + \sum_{k=1}^K \alpha_k}}\)
Usage
The usage below assumes that alpha
is an array of length \(K\).
Package |
Syntax |
---|---|
NumPy |
|
SciPy |
|
Distributions.jl |
|
Stan |
|
Notes
In some cases, we may wish to specify the distribution of an ordered Dirichlet distributed vector \(\theta\). That is, we want \(\theta \sim \text{Dirichlet}(\alpha_1, \alpha_2, \ldots, \alpha_L)\) with \(\theta_i < \theta_{i+1}\) for all \(i < K\). Because of the relationship of the Dirchlet distribution to a set of Gamma distributed random variables, we may specify this in Stan as follows.
data { int<lower=1> K; } parameters { vector<lower=0>[K] alpha; positive_ordered[K] lambda; } transformed parameters { simplex[K] theta = lambda / sum(lambda); } model { target += gamma_lupdf(lambda | alpha, 1); }