Hypergeometric distribution


Story

Consider an urn with \(a\) white balls and \(b\) black balls. Draw \(N\) balls from this urn without replacement. The number white balls drawn, \(n\), is Hypergeometrically distributed.


Example

There are \(a+b\) finches on an island, and \(a\) of them are tagged (and therefore \(b\) of them are untagged). You capture \(N\) finches. The number of captured tagged finches \(n\) is Hypergeometrically distributed.


Parameters

There are three parameters: the number of draws \(N\), the number of white balls \(a\), and the number of black balls \(b\).


Support

The Hypergeometric distribution is supported on the set of integers between \(\mathrm{max}(0, N-b)\) and \(\mathrm{min}(N, a)\), inclusive.


Probability mass function

\[\begin{split}\begin{align} f(n; N, a, b) = \frac{\begin{pmatrix}a \\ n\end{pmatrix} \begin{pmatrix}b \\ N-n\end{pmatrix}}{\begin{pmatrix}a+b \\ N\end{pmatrix}}. \end{align}\end{split}\]

Cumulative distribution function

The cumulative distribution function evaluated for integer \(n\) is

\[\begin{split}\begin{align} F(n; N, a, b) = 1 - \frac{\begin{pmatrix}N \\ n+1\end{pmatrix} \begin{pmatrix}a + b - B \\ a - n - 1\end{pmatrix}}{\begin{pmatrix}a+b \\ a\end{pmatrix}}\,_3F_2(1, n-a+1, n+1-N; n+2, n+b-N+2; 1), \end{align}\end{split}\]

where \(_3F_2(a_1, a_2, a_3;b_1, b_2; z)\) denotes the generalized hypergeometric function.


Moments

Mean: \(\displaystyle{N\,\frac{a}{a+b}}\)

Variance: \(\displaystyle{N\,\frac{ab}{(a + b)^2}\,\frac{a+b-N}{a+b-1}}\)


Usage

Package

Syntax

NumPy

rng.hypergeometric(a, b, N)

SciPy

scipy.stats.hypergeom(a+b, a, N)

Distributions.jl

Hypergeometric(a, b, N)

Stan

hypergeometric(N, a, b)



Notes

  • This distribution is analogous to the Binomial distribution, except that the Binomial distribution describes draws from an urn with replacement. In the analogy, the Binomial parameter \(\theta\) is \(\theta = a/(a+b)\).

  • SciPy uses a different parametrization than NumPy and Stan. Let \(M = a+b\) be the total number of balls in the urn. Then, noting the order of the parameters, since this is what scipy.stats.hypergeom expects, the PMF may be written as

    \[\begin{split}\begin{align} f(n;M,a,N) = \frac{\begin{pmatrix}a \\ n\end{pmatrix} \begin{pmatrix}M-a \\ N-n\end{pmatrix}}{\begin{pmatrix}M \\ N\end{pmatrix}}. \end{align}\end{split}\]
  • Although NumPy and Stan use the same parametrization, note the difference in the ordering of the arguments.


PMF and CDF plots