In many modeling applications, we wish to sample out of a distribution. That means that we randomly draw numbers out of the sample space of the distribution such that the probability of drawing a number less than \(x\) is given by \(F(x)\). Sampling out of a distribution is often easier than computing the distribution over a range of values because many of those values are close to zero.
We may also want to have access to other properties about the distributions, like their PDF/PMF, CDF, quantiles, etc. We will focus on Python-based packages for working with distributions, and each vignette about a distribution includes syntax on how to use NumPy and SciPy tools. I also show how to use the distributions in the probabilistic programming language Stan. In the future, I hope to add distributions/samplers available in R and Julia.
Usage of NumPy
To use the the
numpy.random module to sample out of a distribution, you first need to instantiate a random number generator (RNG) with a specified bit generator. The bit generator generates random bits, and then the RNG converts these to number following a specific probability distribution you want to sample out of. The default generator using the PCG64 bit generator, which is a good choice to use. You can instantiate a random number generator with the default bit generator as follows.
import numpy.random rg = numpy.random.default_rng()
Alternatively, you can seed the random number generator to get a reproducible stream of random numbers.
rg = numpy.random.default_rng(seed=923841)
In the syntax in this app, I assume that a random number generator has already been instantiated and is called
rg. You can then use
rg to draw random number out of a distribution, such as the Normal distribution.
# Draw a single random number from standard Normal rg.normal(loc=0, scale=1)
The syntax for drawing multiple samples involves use of the
size keyword argument.
# Draw 20 random numbers from standard Normal rg.normal(loc=0, scale=1, size=20)
In this app, we always use rg for an instantiated generator, as is done in the NumPy documentation.
For some calculations you may want to just-in-time compile your code that does sampling using Numba. Numba currently only supports the Mersenne Twister bit generator. To use this, simply substitute
rg in calls to NumPy random number generators. As an example, to get a JITted function to sample out of a Normal distribution, you would do the following.
import numpy.random import numba @numba.njit def draw_normal(loc=0.0, scale=1.0, size=1): """Return array of Normally distributed random numbers.""" return numpy.random.normal(loc=loc, scale=scale, size=size)
Usage of SciPy
scipy.stats module is more feature-rich than the
numpy.random package in that it allows more functionality beyond sampling out of a distribution. It also features more distributions. The module also has functions to perform statistical tests, transformations, and other statistical computation. Here, we will focus only on its distributions.
As described in the documentation, every distribution is an instance of a subclass of an rv_continuous or rv_discrete base class. These classes have useful methods, like
rvs(), which draws samples out of the distribution
pmf() for discrete and
cdf() for continuous, which return the value of the probability mass function or probability density function, and others. The instance is instantiated by supplying the appropriate parameters, which vary depending on the distribution. For example, the Normal distribution is implemented as scipy.stats.normal.
import scipy.stats # Instantiate subclass for standard Normal norm = scipy.stats.norm(loc=0, scale=1) # Draw 100 random numbers out of standard Normal norm.rvs(size=100)
If I wanted to compute the probability density function for specific values of \(x\) and plot it, I could do the following.
import numpy as np import scipy.stats import bokeh.io import bokeh.plotting # Instantiate subclass for standard Normal norm = scipy.stats.norm(loc=0, scale=1) # Make PDF values x = np.linspace(-4, 4, 200) y_pdf = norm.pdf(x) # Make the plot p = bokeh.plotting.figure(height=200, width=300, x_axis_label='x', y_axis_label='f(x)', tools="save") p.line(x, y_pdf, line_width=2) p.x_range.range_padding = 0 bokeh.io.show(p)
In the vignettes for each distribution, I show the syntax of how to instantiate the class using
Usage of Stan
We follow Stan’s conventions for probability functions. Within a Stan program specifying a model, we would write
y ~ normal(mu, sigma);
to specify that
y is Normally distributed with parameters
sigma. If we want to draw a random number from a distribution using Stan, we add a
_rng suffix to the probability function name. To draw a random number from a Normal distribution and store it as
x, we do
x = normal_rng(mu, sigma);
Other suffixes are available, and the call signature can change depending on the suffix. For example, to compute the value of the logarithm of the PDF of the Normal distribution at point
x, we use
normal_lpdf(x | mu, sigma).
In the syntax listed in the vignettes, I show the syntax used in a sampling statement. That is a statement like