Learn, Think & Do

Enjoy Randomness !

In this post, we are going to talk about some more discrete distributions and their uses. We will discuss the following distributions:

Categorical distribution

This is the generalization of Bernoulli distribution, in the sense that we now have k mutually exclusive outcomes, out of which only one occurs at a time i.e. they take values in \{0,1\} but with only one outcome taking value as 1(success) at a time (such variables are called as categorical variables, hence the name categorical distribution), with probabilities p_1,p_2,...,p_k such that p_1+p_2+...+p_k=1. It is sometimes called as multinoulli distribution to emphasize the generalization to the Bernoulli distribution in similar vein to the multinomial and binomial distributions. In machine learning parlance, it is used in one-hot encoding and classification problems. Given the probabilities and categories as above, the pmf for this distribution is given by, \prod_{i=1}^k p_i^{[x=i]}, where [x=i] =1 if true else 0, i is the category. This bracket is called as Iverson bracket.

Multinomial distribution

This is nothing but the generalization of the binomial distribution i.e. we now have k mutually exclusive outcomes instead of just two and we calculate the probability of different number of these k cases occurring. Suppose n to be the number of trials, k to be the number of outcomes, let p_1,p_2,...,p_k to be the probability of 1,2,...,k events occurring respectively such that p_1+p_2+...+p_k=1, n_1,n_2,...,n_k to be the number of times outcome 1,2,...,k occurs respectively such that n_1+n_2+....+n_k =n. Then the probability of first event happening n_1 times, second event happening n_2 times and so on, is given by \frac{n!}{n_1!n_2!...n_k!}p_1^{n_1}p_2^{n_2}...p_k^{n_k}. This collection of probabilities is called as the multinomial distribution and using multinomial expansion of (p_1+p_2+...+p_k)^n=1, we can show that this collection is a pmf (probability mass function). When the number of trials i.e. n=1, we get categorical distribution as a special case of multinomial distribution. The point to be noted here is that multinomial distribution is an example of multivariate distribution, in simple words it is in multidimensional or vector form. We would discuss them later in details, so that is why I am not talking about its mean vector, variance vector here. When we sample using multinomial distribution it is called as sampling with replacement because in each draw we calculate probabilities based on n trials.

Hypergeometric distribution

Suppose we have n balls out of which n_1 are red and n_2=n-n_1 are black. Now we want to choose r balls at random such that it contains k number of red balls, choosing both kind of balls is equally likely. What would be the probability of such an event? We can choose k red balls from n_1 balls in \binom{n_1}{k} ways and r-k black balls from n-n_1 balls in \binom{n-n_1}{r-k} ways. Finally we can have any combination of these chosen red and black balls so that total number of ways k red balls and r-k black balls are chosen is \binom{n_1}{k} \binom{n-n_1}{r-k}. So the probability of such an event is given by, say q_k=\frac{\binom{n_1}{k} \binom{n-n_1}{r-k}}{\binom{n}{r}}. The collection of these probabilities define hypergeometric distribution. Using Vandermonde’s identity i.e. \binom{n_1+n_2}{r}=\sum_{k=0}^{k=r} \binom{n_1}{k} \binom{n_2}{r-k}, one can show that this collection of probabilities is, in fact, a pmf. When we sample using hypergeometric distribution, it is a sampling without replacement as we are not replacing the balls taken out with the same ones. There is a famous anecdotal story associated with it, where a lady tastes the tea prepared and calls out whether tea was added first or the milk was added first. It is said that Fischer conducted the experiment with 8 samples of which 4 were tea first and 4 were milk first, arranged in random order. She was able to predict all of them correctly which has probability of \binom{4}{4} \binom{4}{4}/\binom{8}{4} \approx 0.014 if done by random guessing, which meant she was most probably not doing random guessing.

We can easily do sampling of these distributions in R or Python using their statistical libraries, just google them w.r.t. the programming language you want to sample, eg.,numpy,rhypergeom.

References:

George Casella and Roger L. Berger, Statistical Inference, second Edition

William Feller, An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, John Wiley & Sons

Leave a comment

Navigation

About

Mostly about Math-Stats, Finance, Data Science, Artificial Intelligence(AI), and their combination with some random stuff here and there. Happy Learning and enjoy Randomness !