Some Discrete Distributions-II (P-9)

In this post, we are going to talk about some more discrete distributions and their uses. We will discuss the following distributions:

Multinomial distribution
Hypergeometric distribution
Categorical distribution

Categorical distribution

This is the generalization of Bernoulli distribution, in the sense that we now have k mutually exclusive outcomes, out of which only one occurs at a time i.e. they take values in $\{0,1\}$ but with only one outcome taking value as 1(success) at a time (such variables are called as categorical variables, hence the name categorical distribution), with probabilities $p_1,p_2,...,p_k$ such that $p_1+p_2+...+p_k=1$ . It is sometimes called as multinoulli distribution to emphasize the generalization to the Bernoulli distribution in similar vein to the multinomial and binomial distributions. In machine learning parlance, it is used in one-hot encoding and classification problems. Given the probabilities and categories as above, the pmf for this distribution is given by, $\prod_{i=1}^k p_i^{[x=i]}$ , where $[x=i] =1$ if true else $0$ , $i$ is the category. This bracket is called as Iverson bracket.

Multinomial distribution

This is nothing but the generalization of the binomial distribution i.e. we now have k mutually exclusive outcomes instead of just two and we calculate the probability of different number of these k cases occurring. Suppose n to be the number of trials, k to be the number of outcomes, let $p_1,p_2,...,p_k$ to be the probability of $1,2,...,k$ events occurring respectively such that $p_1+p_2+...+p_k=1$ , $n_1,n_2,...,n_k$ to be the number of times outcome $1,2,...,k$ occurs respectively such that $n_1+n_2+....+n_k =n$ . Then the probability of first event happening $n_1$ times, second event happening $n_2$ times and so on, is given by $\frac{n!}{n_1!n_2!...n_k!}p_1^{n_1}p_2^{n_2}...p_k^{n_k}$ . This collection of probabilities is called as the multinomial distribution and using multinomial expansion of $(p_1+p_2+...+p_k)^n=1$ , we can show that this collection is a pmf (probability mass function). When the number of trials i.e. $n=1$ , we get categorical distribution as a special case of multinomial distribution. The point to be noted here is that multinomial distribution is an example of multivariate distribution, in simple words it is in multidimensional or vector form. We would discuss them later in details, so that is why I am not talking about its mean vector, variance vector here. When we sample using multinomial distribution it is called as sampling with replacement because in each draw we calculate probabilities based on $n$ trials.

Hypergeometric distribution

Suppose we have $n$ balls out of which $n_1$ are red and $n_2=n-n_1$ are black. Now we want to choose $r$ balls at random such that it contains $k$ number of red balls, choosing both kind of balls is equally likely. What would be the probability of such an event? We can choose $k$ red balls from $n_1$ balls in $\binom{n_1}{k}$ ways and $r-k$ black balls from $n-n_1$ balls in $\binom{n-n_1}{r-k}$ ways. Finally we can have any combination of these chosen red and black balls so that total number of ways $k$ red balls and $r-k$ black balls are chosen is $\binom{n_1}{k} \binom{n-n_1}{r-k}$ . So the probability of such an event is given by, say $q_k=\frac{\binom{n_1}{k} \binom{n-n_1}{r-k}}{\binom{n}{r}}$ . The collection of these probabilities define hypergeometric distribution. Using Vandermonde’s identity i.e. $\binom{n_1+n_2}{r}=\sum_{k=0}^{k=r} \binom{n_1}{k} \binom{n_2}{r-k}$ , one can show that this collection of probabilities is, in fact, a pmf. When we sample using hypergeometric distribution, it is a sampling without replacement as we are not replacing the balls taken out with the same ones. There is a famous anecdotal story associated with it, where a lady tastes the tea prepared and calls out whether tea was added first or the milk was added first. It is said that Fischer conducted the experiment with 8 samples of which 4 were tea first and 4 were milk first, arranged in random order. She was able to predict all of them correctly which has probability of $\binom{4}{4} \binom{4}{4}/\binom{8}{4} \approx 0.014$ if done by random guessing, which meant she was most probably not doing random guessing.

We can easily do sampling of these distributions in R or Python using their statistical libraries, just google them w.r.t. the programming language you want to sample, eg.,numpy,rhypergeom.

References:

George Casella and Roger L. Berger, Statistical Inference, second Edition

William Feller, An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, John Wiley & Sons

Learn, Think & Do

Some Discrete Distributions-II (P-9)

Leave a comment Cancel reply

Navigation

About

Some Discrete Distributions-II (P-9)

Share this:

Leave a comment Cancel reply

Navigation

About