Multinomial distribution
Story
This is a generalization of the Binomial distribution. Instead of a Bernoulli trial consisting of two outcomes, each trial has \(K\) outcomes. The probability of getting \(y_1\) of outcome 1, \(y_2\) of outcome 2, …, and \(y_K\) of outcome \(K\) out of a total of \(N\) trials is Multinomially distributed.
Example
There are two alleles in a population, A and a. Each individual may have genotype AA, Aa, or aa. The probability distribution describing having \(y_1\) AA individuals, \(y_2\) Aa individuals, and \(y_3\) aa individuals in a population of \(N\) total individuals is Multinomially distributed.
Parameters
\(N\), the total number of trials, and \(\boldsymbol{\theta} = \left\{\theta_1, \theta_2, \ldots,\theta_K\right\}\), the probabilities of each outcome. Note that \(\sum_{i=1}^K \theta_i = 1\) and there is the further restriction that \(N = \sum_{i=1}^K y_i\).
Support
The \(K\)-nomial distribution is supported on \(\mathbb{N}^K\).
Probability mass function
Moments
Mean of \(y_i\): \(N\theta_i\)
Variance of \(y_i\): \(N\theta_i(1-\theta_i)\)
Covariance of \(y_i, y_j\) with \(j\ne i\): \(-N\theta_i\theta_j\)
Usage
The usage below assumes theta
is a length \(K\) array.
Package |
Syntax |
---|---|
NumPy |
|
SciPy |
|
Distributions.jl |
|
Stan sampling |
|
Stan rng |
|
Notes
For a sampling statement in Stan, the value of \(N\) is implied.