# Software packages

In many modeling applications, we wish to **sample** out of a distribution. That means that we randomly draw numbers out of the sample space of the distribution such that the probability of drawing a number less than \(x\) is given by \(F(x)\). Sampling out of a distribution is often easier than computing the distribution over a range of values because many of those values are close to zero.

We may also want to have access to other properties about the distributions, like their PDF/PMF, CDF, quantiles, etc. We will focus on Python-based packages for working with distributions, and each vignette about a distribution includes syntax on how to use NumPy and SciPy tools. I also show how to use the distributions in the probabilistic programming language Stan. In the future, I hope to add distributions/samplers available in R and Julia.

## Usage of NumPy

To use the the `numpy.random`

module to sample out of a distribution, you first need to instantiate a random number generator (RNG) with a specified bit generator. The bit generator generates random bits, and then the RNG converts these to number following a specific probability distribution you want to sample out of. The default generator using the PCG64 bit generator, which is a good choice to use. You can instantiate a random number generator with the default bit generator as follows.

```
import numpy.random
rg = numpy.random.default_rng()
```

Alternatively, you can seed the random number generator to get a reproducible stream of random numbers.

```
rg = numpy.random.default_rng(seed=923841)
```

In the syntax in this app, I assume that a random number generator has already been instantiated and is called `rg`

. You can then use `rg`

to draw random number out of a distribution, such as the Normal distribution.

```
# Draw a single random number from standard Normal
rg.normal(loc=0, scale=1)
```

The syntax for drawing multiple samples involves use of the `size`

keyword argument.

```
# Draw 20 random numbers from standard Normal
rg.normal(loc=0, scale=1, size=20)
```

In this app, we always use rg for an instantiated generator, as is done in the NumPy documentation.

### Using Numba

For some calculations you may want to just-in-time compile your code that does sampling using Numba. Numba currently only supports the Mersenne Twister bit generator. To use this, simply substitute `numpy.random`

for `rg`

in calls to NumPy random number generators. As an example, to get a JITted function to sample out of a Normal distribution, you would do the following.

```
import numpy.random
import numba
@numba.njit
def draw_normal(loc=0.0, scale=1.0, size=1):
"""Return array of Normally distributed random numbers."""
return numpy.random.normal(loc=loc, scale=scale, size=size)
```

## Usage of SciPy

The `scipy.stats`

module is more feature-rich than the `numpy.random`

package in that it allows more functionality beyond sampling out of a distribution. It also features more distributions. The module also has functions to perform statistical tests, transformations, and other statistical computation. Here, we will focus only on its distributions.

As described in the documentation, every distribution is an instance of a subclass of an rv_continuous or rv_discrete base class. These classes have useful methods, like `rvs()`

, which draws samples out of the distribution `pmf()`

for discrete and `cdf()`

for continuous, which return the value of the probability mass function or probability density function, and others. The instance is instantiated by supplying the appropriate parameters, which vary depending on the distribution. For example, the Normal distribution is implemented as scipy.stats.normal.

```
import scipy.stats
# Instantiate subclass for standard Normal
norm = scipy.stats.norm(loc=0, scale=1)
# Draw 100 random numbers out of standard Normal
norm.rvs(size=100)
```

If I wanted to compute the probability density function for specific values of \(x\) and plot it, I could do the following.

```
import numpy as np
import scipy.stats
import bokeh.io
import bokeh.plotting
# Instantiate subclass for standard Normal
norm = scipy.stats.norm(loc=0, scale=1)
# Make PDF values
x = np.linspace(-4, 4, 200)
y_pdf = norm.pdf(x)
# Make the plot
p = bokeh.plotting.figure(height=200, width=300, x_axis_label='x', y_axis_label='f(x)', tools="save")
p.line(x, y_pdf, line_width=2)
p.x_range.range_padding = 0
bokeh.io.show(p)
```

In the vignettes for each distribution, I show the syntax of how to instantiate the class using `scipy.stats`

, e.g., `scipy.stats.norm(mu, sigma)`

.

## Usage of Stan

We follow Stan’s conventions for probability functions. Within a Stan program specifying a model, we would write

```
y ~ normal(mu, sigma);
```

to specify that `y`

is Normally distributed with parameters `mu`

and `sigma`

. If we want to draw a random number from a distribution using Stan, we add a `_rng`

suffix to the probability function name. To draw a random number from a Normal distribution and store it as `x`

, we do

```
x = normal_rng(mu, sigma);
```

Other suffixes are available, and the call signature can change depending on the suffix. For example, to compute the value of the logarithm of the PDF of the Normal distribution at point `x`

, we use `normal_lpdf(x | mu, sigma)`

.

In the syntax listed in the vignettes, I show the syntax used in a sampling statement. That is a statement like `normal(mu, sigma)`

.