Some Discrete Distributions-1 (P-6)

In this post I am going to talk about some of the discrete distributions. In a previous article I have already defined what a distribution is, so here I would be talking about few of them. They are as follows:

Binomial distribution
Geometric distribution
Negative binomial distribution
Poisson distribution

Binomial distribution

This distribution is characterized by the random variable (rv) defined by number of successes (heads) which happens with probability $p$ where $0<p<1$ , say for some fixed number $k$ in $N$ number of tosses, this is given by $\binom{N}{k}p^k (1-p)^{N-k}$ . These are simply Bernoulli trials performed $N$ times with the inherent assumption of independence. It is easy to verify that this defines probability mass function (pmf) that is, we have, $\sum_{k=0}^N \binom{N}{k}p^k (1-p)^{N-k}=1$ , by the application of the binomial theorem. Similarly we can find expectation and variance of this rv. Let me give short details.

$E[X]=\sum_{k=0}^N kP(X=k)=\sum_{k=0}^N k\binom{N}{k}p^k (1-p)^{N-k}=\sum_{k=1}^N k\binom{N}{k}p^k (1-p)^{N-k}=Np\sum_{k=1}^N \binom{N-1}{k-1}p^{k-1} (1-p)^{N-k}=Np(p+1-p)^{N-1}=Np.$ $Var(X)=E[X^2]-(E[X])^2$ , we already know the value of $E[X]$ . Let’s calculate $E[X^2]=\sum_{k=0}^N k^2P(X=k)=\sum_{k=0}^N k^2\binom{N}{k}p^k (1-p)^{N-k}=\sum_{k=1}^N k^2\binom{N}{k}p^k (1-p)^{N-k}=Np \sum_{k=1}^N k \binom{N-1}{k-1}p^{k-1} (1-p)^{N-k}=Np(\sum_{k=1}^N (k-1) \binom{N-1}{k-1}p^{k-1} (1-p)^{N-k}+\sum_{k=1}^N \binom{N-1}{k-1}p^{k-1} (1-p)^{N-k})=Np(N-1)p+Np$ , here a part of the first term is the expectation for $N-1$ trials which is $(N-1)p$ . Now $Var(X)=N(N-1)p^2+Np-N^2p^2=Np(1-p)$ .

Geometric distribution

For the geometric distribution the random variable is given by the number of tosses required before a success is observed where the probability of success is p, $0<p<1$ , and this happens with $P(X=n)=p(1-p)^{n-1}$ . Using identities $\sum_{n=0}^{\infty}r^n=\frac{1}{1-r}$ , for $|r|<1$ , $\sum_{n=1}^{\infty}nr^{n-1}=\frac{1}{(1-r)^2}$ and $\sum_{n=2}^{\infty}n(n-1)r^{n-2}=\frac{2}{(1-r)^3}$ , we can find $E[X]=\frac{1}{p}$ and $Var(X)=\frac{1-p}{p^2}$ . Suppose a couple decides to have children until a daughter is born, then what is the expected number of the children of this couple? This problem can be solved using the geometric distribution with the probability of $\frac{1}{2}.$ So the expected number of children is $1/(1/2)=2$ .

Negative binomial distribution

This distribution is basically a generalized version of geometric distribution. In this case we focus on r successes instead of just one. Here the random variable X for the negative binomial distribution with r successes is given by, $P(X=n)=\binom{n-1}{r-1}p^r(1-p)^{n-r}$ where $p$ is the probability of success, $n \geq r$ . The expectation and variance is given by $E[X]=r(1-p)/p$ and $Var(X)=r(1-p)/p^2$ . You might be wondering why it is called negative. This has to do with rephrasing the problem in terms of ‘failures’ and using negative combinatorial. Well, I know you are confused so let me explain. Let $Y$ denote the number of failures before the $r$ th success. In this case we have $P(Y=y)=\binom{y+r-1}{y}p^r(1-p)^y$ , where combinatorial part can also be written as $\binom{y+r-1}{y}=(-1)^y\binom{-r}{y}=(-1)^y\frac{(-r)(-r-1)(-r-2)...(-r-y+1)}{(y)(y-1)...(1)}$ . So this distribution can be written as $P(Y=y)=\binom{-r}{y}p^r(1-p)^y$ which is similar to binomial distribution that’s why it is called as negative binomial distribution. For more information about negative combinatorial you can search online resources.

Poisson distribution

This distribution is widely used for modeling events happening within some interval, mostly number of events happening within a time interval. For example the probability of the certain number of customers coming in a shop in an hour when we know average in an hour from the historical data, this can be done using Poisson distribution using the random variable X when average is assumed to be $\lambda$ defined by, $P(X=k)=e^{-\lambda}\frac{\lambda^k}{k!}$ . As we know that $\sum_{k=0}^{\infty}\frac{\lambda^k}{k!}=e^{\lambda}$ . Using this we can easily show that the rv $X$ defined this way is clearly a pmf. The expectation and variance for this distribution is same that is, $\lambda$ . The Poisson distribution is used as an approximation to binomial distribution in cases where $p$ is small and $n$ is large, we take $\lambda = np$ as expectation and for small $p$ , variance in case of binomial distribution is $np(1-p)$ which becomes approximately $\lambda=np\approx np(1-p)$ . I can prove that this approximation of binomial is true for every value of success given $p$ is small and as $n$ goes to infinity. But I won’t be going into this detail instead I will be giving an example: Consider a text which contains, on the average, an error every 300 words. A page contains about 400 words. What is the probability that there will be no more than 2 errors in next five pages? We can model this problem using binomial distribution where $p=\frac{1}{300}$ , $n=400*5=2000$ and number of errors in five pages is given by rv $X$ . We want to calculate $P(X\leq 2)=\sum_{x=0}^2 \binom{2000}{x}(\frac{1}{300})^x (\frac{299}{300})^{2000-x}=0.0378$ . Now if we calculate Poisson distribution using $\lambda=2000*\frac{1}{300}=\frac{20}{3}$ , we get $e^{-20/3}(1+20/3+(20/3)^2(1/2))=0.0380$ . As you can see the approximation is pretty close and it saves us a lot of cumbersome calculation involved in binomial case especially when there were no modern computational tools. You can note this as a rule of thumb in pragmatic modeling, simple models are preferred over more complex models unless there is a huge advantage in choosing far more complex models.