Pareto distribution
Story
There is no real story to the Pareto distribution, except that it is a distribution where the tail of the PDF or PMF follows a power law (\(f(y) \sim y^{-\alpha-1}\)). Such distributions often arise in physical scenarios.
Example
The Gutenberg-Richter Law says that the magnitudes of earthquakes in a given region are Pareto distributed. Other random variables that are often described by power laws include size of human settlement (many small towns, a few huge cities), and income distribution (many poor, few extremely rich).
Parameters
The Pareto distribution has two paramters, \(\alpha\) and \(y_\mathrm{min}\). The parameter \(\alpha\) sets the power in the power law and \(y_\mathrm{min}\) is a lower cutoff to ensure that the distribution is normalizable. Both \(\alpha\) and \(y_\mathrm{min}\) must be positive.
Support
The Pareto distribution has support on real numbers greater than or equal to \(y_\mathrm{min}\).
Probability density function
Cumulative distribution function
Moments
Mean: The mean is infinite for \(\alpha \le 1\) and \(\displaystyle{\frac{\alpha y_\mathrm{min}}{\alpha - 1}}\) for \(\alpha > 1\).
Variance: The variance is infinite for \(\alpha \le 2\) and \(\displaystyle{\frac{\alpha y_\mathrm{min}^2}{(\alpha - 1)^2(\alpha - 2)}}\) for \(\alpha > 2\).
Usage
Package |
Syntax |
---|---|
NumPy |
|
SciPy |
|
Distributions.jl |
|
Stan |
|
Notes
A Pareto distribution is sometimes referred to as a power law distribution. Generically, a distribution is said to be a power law distribution if its tail decays like \(y^{-\beta}\) for some positive \(\beta\).
The Type II Pareto distribution is often used. It is a Pareto distribution, except with a redefinition of \(y \to y - \mu + y_\mathrm{min}\). This shifts \(y\) such that its support starts at \(y=\mu\). In the case there \(\mu = 0\), the Type II distribution is called a Lomax distribution. NumPy’s Pareto sample samples out of a Lomax distirbution with \(y_\mathrm{min}\) set to one. Thus, to sample out of a Pareto distribution, the transformations described in the usage table above are necessary. To use a Type II Pareto distribution in Stan, \(y_\mathrm{min}\) is renamed \(\lambda\), and the syntax is
pareto_type_2(mu, lambda, alpha)
.The Pareto distribution is often best visualized by plotting the complementary cumulative distribution function (CCDF), denoted \(\bar{F}(y)\), which is related to the CDF \(F(y)\) by \(\bar{F}(y) = 1 - F(y)\). The CCDF for a Pareto distribution is
\[\begin{split}\begin{align} \bar{F}(y) = \left\{\begin{array}{lll} \left(\frac{y_\mathrm{min}}{y}\right)^\alpha & & y \ge y_\mathrm{min} \\ 1 & & y < y_\mathrm{min} \end{array} \right. \end{align}\end{split}\]Thus, the power law is clear. A plot of the CCDF on a log-log plot yields a line with slope equal to \(-\alpha\), as shown below for \(y_\mathrm{min} = 1\) and \(\alpha = 2\).