# Negative Binomial distribution

## Story

We perform a series of Bernoulli trials, each with probability $$\beta/(1+\beta)$$ of success. The number of failures, $$y$$, before we get $$\alpha$$ successes is Negative Binomially distributed.

An equivalent story is this: Draw a parameter $$\lambda$$ out of a Gamma distribution with parameters $$\alpha$$ and $$\beta$$. Then draw a number $$y$$ out of a Poisson distribution with parameter $$\lambda$$. Then $$y$$ is Negative Binomially distributed with parameters $$\alpha$$ and $$\beta$$. For this reason, the Negative Binomial distribution is sometimes called the Gamma-Poisson distribution.

## Example

Bursty gene expression can give mRNA count distributions that are Negative Binomially distributed. Here, “success” is that a burst in gene expression stops. In this case, the parameter $$1/\beta$$ is the mean number of transcripts in a burst of expression. The parameter $$\alpha$$ is related to the frequency of the bursts. If multiple bursts are possible within the lifetime of mRNA, then $$\alpha > 1$$. Then, the number of “failures” is the number of mRNA transcripts that are made in the characteristic lifetime of mRNA.

## Parameters

There are two parameters: $$\alpha$$, the desired number of successes, and $$\beta$$, which is the scale parameter of the Gamma distribution that gives rise to the Negative Binomial. The probability of success of each Bernoulli trial is given by $$\beta/(1+\beta)$$.

## Support

The Negative-Binomial distribution is supported on the set of nonnegative integers.

## Probability mass function

\begin{split}\begin{align} f(y;\alpha,\beta) = \begin{pmatrix} y+\alpha-1 \\ \alpha-1 \end{pmatrix} \left(\frac{\beta}{1+\beta}\right)^\alpha \left(\frac{1}{1+\beta}\right)^y. \end{align}\end{split}

Generally speaking, $$\alpha$$ need not be an integer, so we may write the PMF as

\begin{align} f(y;\alpha,\beta) = \frac{\Gamma(y+\alpha)}{\Gamma(\alpha) \, y!}\,\left(\frac{\beta}{1+\beta}\right)^\alpha \left(\frac{1}{1+\beta}\right)^y. \end{align}

See the notes below for other parametrizations.

## Cumulative distribution function

The CDF evaluated at nonnegative integers $$n$$ is

\begin{align} F(n;N,\theta) = I_{\beta/(1+\beta)}(\alpha, n + 1), \end{align}

where $$I_x(a, b)$$ is the regularized incomplete beta function, given by

\begin{align} I_x(a, b) = \frac{1}{B(a, b)}\,\int_0^x \mathrm{d}y\,y^{a-1}(1-y)^{b-1}. \end{align}

## Moments

Mean: $$\displaystyle{\frac{\alpha}{\beta}}$$

Variance: $$\displaystyle{\frac{\alpha(1+\beta)}{\beta^2}}$$

## Usage

Package

Syntax

NumPy

rng.negative_binomial(alpha, beta/(1+beta))

NumPy with (µ, φ) parametrization

rng.negative_binomial(phi, phi/(mu+phi))

SciPy

scipy.stats.nbinom(alpha, beta/(1+beta))

SciPy with (µ, φ) parametrization

scipy.stats.nbinom(phi, phi/(mu+phi))

Distributions.jl

NegativeBinomial(alpha, beta/(1+beta))

Distributions.jl with (µ, φ) parametrization

NegativeBinomial(phi, phi/(mu+phi))

Stan

neg_binomial(alpha, beta)

Stan with (µ, φ) parametrization

neg_binomial_2(mu, phi)

## Notes

• The Negative Binomial distribution may be parametrized such that the probability mass function is

\begin{align} f(y;\mu,\phi) = \frac{\Gamma(y+\phi)}{\Gamma(\phi) \, y!}\,\left(\frac{\phi}{\mu +\phi}\right)^\phi\left(\frac{\mu}{\mu+\phi}\right)^y. \end{align}

These parameters are related to the parametrization above by $$\phi = \alpha$$ and $$\mu = \alpha/\beta$$. In the limit of $$\phi\to\infty$$, which can be taken for the PMF, the Negative Binomial distribution becomes Poisson with parameter $$\mu$$. This also gives meaning to the parameters $$\mu$$ and $$\phi$$; $$\mu$$ is the mean of the Negative Binomial, and $$\phi$$ controls extra width of the distribution beyond Poisson. The smaller $$\phi$$ is, the broader the distribution.

In this parametrization, the pertinent moments are

Mean: $$\displaystyle{\mu}$$

Variance: $$\displaystyle{\mu\left(1 + \frac{\mu}{\phi}\right)}$$.

In Stan, the Negative Binomial distribution using the $$(\mu,\phi)$$ parametrization is called neg_binomial_2.

• SciPy and NumPy use yet another parametrization. The PMF for SciPy is

\begin{align} f(y;n, p) = \frac{\Gamma(y+n)}{\Gamma(n) \, y!}\,p^n \left(1-p\right)^y. \end{align}

The parameter $$1-p$$ is the probability of success of a Bernoulli trial (as defined in the story above). The parameters are related to the others we have defined by $$n=\alpha=\phi$$ and $$p=\beta/(1+\beta) = \phi/(\mu+\phi)$$. In this parametrization, the pertinent moments are

Mean: $$\displaystyle{n\,\frac{1-p}{p}}$$

Variance: $$\displaystyle{n\,\frac{1-p}{p^2}}$$.

## PMF and CDF plots

Note: Quantile setting of both parameters for a Negative Binomial distribution is a challenging problem for a few reasons. First, there is no guarantee that a parameter set exists to give two specified value-quantile pairs can be obtained. Secondly, in other cases, there is a degeneracy of parameters that give the same quantiles. As an example, if we wished for 4 to be the 2.5th percentile and 17 to be the 97.5th percentile, we could achieve this with $$\alpha = 100$$ and $$\beta = 10$$, with $$\alpha = 350$$ and $$\beta = 35$$, with $$\alpha = 10^9$$ and $$\beta = 10^8$$, and countless other combinations. (This is because the large $$\alpha$$ limit is Poisson.) So, instead of manipulating two parameters to hit two quantiles, we can lock one parameter and set the other parameter to give a single desired percentile. In the $$\alpha$$-$$\beta$$ formulation, we fix $$\alpha$$, and in the $$\mu\text{-}\phi$$ formulation, we fix $$\mu$$.

In the α-β formulation:

In the µ-φ formulation: