Multinomial distribution


Story

This is a generalization of the Binomial distribution. Instead of a Bernoulli trial consisting of two outcomes, each trial has \(K\) outcomes. The probability of getting \(y_1\) of outcome 1, \(y_2\) of outcome 2, …, and \(y_K\) of outcome \(K\) out of a total of \(N\) trials is Multinomially distributed.


Example

There are two alleles in a population, A and a. Each individual may have genotype AA, Aa, or aa. The probability distribution describing having \(y_1\) AA individuals, \(y_2\) Aa individuals, and \(y_3\) aa individuals in a population of \(N\) total individuals is Multinomially distributed.


Parameters

\(N\), the total number of trials, and \(\boldsymbol{\theta} = \left\{\theta_1, \theta_2, \ldots,\theta_K\right\}\), the probabilities of each outcome. Note that \(\sum_{i=1}^K \theta_i = 1\) and there is the further restriction that \(N = \sum_{i=1}^K y_i\).


Support

The \(K\)-nomial distribution is supported on \(\mathbb{N}^K\).


Probability mass function

\[\begin{align} f(\mathbf{y};\boldsymbol{\theta}, N) = \frac{N!}{y_1!\,y_2!\cdots y_K!}\,\theta_1^{y_1}\,\theta_2^{y_2}\cdots\theta_K^{y_K}. \end{align}\]

Moments

Mean of \(y_i\): \(N\theta_i\)

Variance of \(y_i\): \(N\theta_i(1-\theta_i)\)

Covariance of \(y_i, y_j\) with \(j\ne i\): \(-N\theta_i\theta_j\)


Usage

The usage below assumes theta is a length \(K\) array.

Package

Syntax

NumPy

rng.multinomial(N, theta)

SciPy

scipy.stats.multinomial(N, theta)

Distributions.jl

Multinomial(N, theta)

Stan sampling

multinomial(theta)

Stan rng

multinomial_rng(theta, N)



Notes

  • For a sampling statement in Stan, the value of \(N\) is implied.