# Categorical distribution¶

## Story¶

A probability is assigned to each of a set of discrete outcomes.

## Example¶

A hen will peck at grain A with probability $$\theta_\mathrm{A}$$, grain B with probability $$\theta_\mathrm{B}$$, and grain C with probability $$\theta_\mathrm{C}$$.

## Parameters¶

The distribution is parametrized by the probabilities assigned to each event. We define $$\theta_y$$ to be the probability assigned to outcome $$y$$. The set of $$\theta_y$$’s are the parameters, and are constrained by

\begin{align} \sum_y \theta_y = 1. \end{align}

## Support¶

If we index the categories with sequential integers from 1 to N, the distribution is supported for integers 1 to N, inclusive when described using the indices of the categories.

## Probability mass function¶

\begin{align} f(y;\{\theta)y\}) = \theta_y \end{align}

## Moments¶

Moments are not defined for a Categorical distribution because the value of $$y$$ is not necessarily numeric.

## Usage¶

Package

Syntax

NumPy

rg.choice(len(theta), p=theta)

SciPy

scipy.stats.rv_discrete(values=(range(len(theta)), theta)).rvs()

Stan

categorical(theta)

## Notes¶

• This distribution must be manually constructed if you are using the scipy.stats module using scipy.stats.rv_discrete(). The categories need to be encoded by an index. For interactive plotting purposes, below, we need to specify a custom PMF and CDF.

• To sample out of a Categorical distribution, use numpy.random.choice(), specifying the values of $$\theta$$ using the p kwarg.