Hypergeometric distribution
Story
Consider an urn with \(a\) white balls and \(b\) black balls. Draw \(N\) balls from this urn without replacement. The number white balls drawn, \(n\), is Hypergeometrically distributed.
Example
There are \(a+b\) finches on an island, and \(a\) of them are tagged (and therefore \(b\) of them are untagged). You capture \(N\) finches. The number of captured tagged finches \(n\) is Hypergeometrically distributed.
Parameters
There are three parameters: the number of draws \(N\), the number of white balls \(a\), and the number of black balls \(b\).
Support
The Hypergeometric distribution is supported on the set of integers between \(\mathrm{max}(0, N-b)\) and \(\mathrm{min}(N, a)\), inclusive.
Probability mass function
Cumulative distribution function
The cumulative distribution function evaluated for integer \(n\) is
where \(_3F_2(a_1, a_2, a_3;b_1, b_2; z)\) denotes the generalized hypergeometric function.
Moments
Mean: \(\displaystyle{N\,\frac{a}{a+b}}\)
Variance: \(\displaystyle{N\,\frac{ab}{(a + b)^2}\,\frac{a+b-N}{a+b-1}}\)
Usage
Package |
Syntax |
---|---|
NumPy |
|
SciPy |
|
Distributions.jl |
|
Stan |
|
Notes
This distribution is analogous to the Binomial distribution, except that the Binomial distribution describes draws from an urn with replacement. In the analogy, the Binomial parameter \(\theta\) is \(\theta = a/(a+b)\).
SciPy uses a different parametrization than NumPy and Stan. Let \(M = a+b\) be the total number of balls in the urn. Then, noting the order of the parameters, since this is what
scipy.stats.hypergeom
expects, the PMF may be written as\[\begin{split}\begin{align} f(n;M,a,N) = \frac{\begin{pmatrix}a \\ n\end{pmatrix} \begin{pmatrix}M-a \\ N-n\end{pmatrix}}{\begin{pmatrix}M \\ N\end{pmatrix}}. \end{align}\end{split}\]Although NumPy and Stan use the same parametrization, note the difference in the ordering of the arguments.