Beta distribution


Say you wait for two multistep Poisson processes to arive. The individual steps of each process happen at the same rate, but the first multistep process requires \(\alpha\) steps and the second requires \(\beta\) steps. The fraction of the total waiting time taken by the first process is Beta distributed.


There are two parameters, both strictly positive: \(\alpha\) and \(\beta\), defined in the above story.


The Beta distribution has support on the interval [0, 1].

Probability density function

\[\begin{align} f(\theta; \alpha, \beta) = \frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha, \beta)}, \end{align}\]


\[\begin{align} B(\alpha, \beta) = \frac{\Gamma(\alpha)\,\Gamma(\beta)}{\Gamma(\alpha + \beta)} \end{align}\]

is the Beta function.


Mean: \(\displaystyle{\frac{\alpha}{\alpha + \beta}}\)

Variance: \(\displaystyle{\frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}}\)





rg.beta(alpha, beta)


scipy.stats.beta(alpha, beta)


beta(alpha, beta)


  • The story of the Beta distribution is difficult to parse. Most importantly, the Beta distribution allows us to put probabilities on unknown probabilities. It is only defined on \(0 \le \theta \le 1\), and \(\theta\) here can be interpreted as a probability, say of success in a Bernoulli trial.

  • The case where \(\alpha = \beta = 0\) is not technically a probability distribution because the PDF cannot be normalized. Nonetheless, it is often used as an improper prior, and this prior is known a Haldane prior, names after biologist J. B. S. Haldane. The case where \(\alpha = \beta = 1/2\) is sometimes called a Jeffreys prior.

  • The Beta distribution may also be parametrized in terms of the location parameter \(\phi\) and concentration \(\kappa\), which are related to \(\alpha\) and \(\beta\) as

\[\begin{split}\begin{align} &\phi = \frac{\alpha}{\alpha + \beta}, \\ &\kappa = \alpha + \beta. \end{align}\end{split}\]

The location parameter \(\phi\) is the mean of the distribution and \(\kappa\) is a measure of how broad it is. To convert back to an \((\alpha, \beta)\) parametrization from a \((\phi, \kappa)\) parametrization, use

\[\begin{split}\begin{align} &\alpha = \phi \kappa, \\ &\beta = (1-\phi)\kappa. \end{align}\end{split}\]

The mean and variance in terms of \(\phi\) and \(\kappa\) are

Mean: \(\displaystyle{\phi}\)

Variance: \(\displaystyle{\frac{\phi(1-\phi)}{1+\kappa}}\).

PDF and CDF plots