Negative Binomial distribution
Story
We perform a series of Bernoulli trials, each with probability \(\beta/(1+\beta)\) of success. The number of failures, \(y\), before we get \(\alpha\) successes is Negative Binomially distributed.
An equivalent story is this: Draw a parameter \(\lambda\) out of a Gamma distribution with parameters \(\alpha\) and \(\beta\). Then draw a number \(y\) out of a Poisson distribution with parameter \(\lambda\). Then \(y\) is Negative Binomially distributed with parameters \(\alpha\) and \(\beta\). For this reason, the Negative Binomial distribution is sometimes called the GammaPoisson distribution.
Example
Bursty gene expression can give mRNA count distributions that are Negative Binomially distributed. Here, “success” is that a burst in gene expression stops. In this case, the parameter \(1/\beta\) is the mean number of transcripts in a burst of expression. The parameter \(\alpha\) is related to the frequency of the bursts. If multiple bursts are possible within the lifetime of mRNA, then \(\alpha > 1\). Then, the number of “failures” is the number of mRNA transcripts that are made in the characteristic lifetime of mRNA.
Parameters
There are two parameters: \(\alpha\), the desired number of successes, and \(\beta\), which is the scale parameter of the Gamma distribution that gives rise to the Negative Binomial. The probability of success of each Bernoulli trial is given by \(\beta/(1+\beta)\).
Support
The NegativeBinomial distribution is supported on the set of nonnegative integers.
Probability mass function
Generally speaking, \(\alpha\) need not be an integer, so we may write the PMF as
See the notes below for other parametrizations.
Cumulative distribution function
The CDF evaluated at nonnegative integers \(n\) is
where \(I_x(a, b)\) is the regularized incomplete beta function, given by
Moments
Mean: \(\displaystyle{\frac{\alpha}{\beta}}\)
Variance: \(\displaystyle{\frac{\alpha(1+\beta)}{\beta^2}}\)
Usage
Package 
Syntax 

NumPy 

NumPy with (µ, φ) parametrization 

SciPy 

SciPy with (µ, φ) parametrization 

Distributions.jl 

Distributions.jl with (µ, φ) parametrization 

Stan 

Stan with (µ, φ) parametrization 

Notes
The Negative Binomial distribution may be parametrized such that the probability mass function is
\[\begin{align} f(y;\mu,\phi) = \frac{\Gamma(y+\phi)}{\Gamma(\phi) \, y!}\,\left(\frac{\phi}{\mu +\phi}\right)^\phi\left(\frac{\mu}{\mu+\phi}\right)^y. \end{align}\]These parameters are related to the parametrization above by \(\phi = \alpha\) and \(\mu = \alpha/\beta\). In the limit of \(\phi\to\infty\), which can be taken for the PMF, the Negative Binomial distribution becomes Poisson with parameter \(\mu\). This also gives meaning to the parameters \(\mu\) and \(\phi\); \(\mu\) is the mean of the Negative Binomial, and \(\phi\) controls extra width of the distribution beyond Poisson. The smaller \(\phi\) is, the broader the distribution.
In this parametrization, the pertinent moments are
Mean: \(\displaystyle{\mu}\)
Variance: \(\displaystyle{\mu\left(1 + \frac{\mu}{\phi}\right)}\).
In Stan, the Negative Binomial distribution using the \((\mu,\phi)\) parametrization is called
neg_binomial_2
.SciPy and NumPy use yet another parametrization. The PMF for SciPy is
\[\begin{align} f(y;n, p) = \frac{\Gamma(y+n)}{\Gamma(n) \, y!}\,p^n \left(1p\right)^y. \end{align}\]The parameter \(1p\) is the probability of success of a Bernoulli trial (as defined in the story above). The parameters are related to the others we have defined by \(n=\alpha=\phi\) and \(p=\beta/(1+\beta) = \phi/(\mu+\phi)\). In this parametrization, the pertinent moments are
Mean: \(\displaystyle{n\,\frac{1p}{p}}\)
Variance: \(\displaystyle{n\,\frac{1p}{p^2}}\).
PMF and CDF plots
Note: Quantile setting of both parameters for a Negative Binomial distribution is a challenging problem for a few reasons. First, there is no guarantee that a parameter set exists to give two specified valuequantile pairs can be obtained. Secondly, in other cases, there is a degeneracy of parameters that give the same quantiles. As an example, if we wished for 4 to be the 2.5th percentile and 17 to be the 97.5th percentile, we could achieve this with \(\alpha = 100\) and \(\beta = 10\), with \(\alpha = 350\) and \(\beta = 35\), with \(\alpha = 10^9\) and \(\beta = 10^8\), and countless other combinations. (This is because the large \(\alpha\) limit is Poisson.) So, instead of manipulating two parameters to hit two quantiles, we can lock one parameter and set the other parameter to give a single desired percentile. In the \(\alpha\)\(\beta\) formulation, we fix \(\alpha\), and in the \(\mu\text{}\phi\) formulation, we fix \(\mu\).
In the αβ formulation:
In the µφ formulation: