Learn, Think & Do

Enjoy Randomness !

Some Limit theorems in Probability (P-7)

In this post, I am going to talk about some of the limit theorems which are building blocks of statistical applications but often they work behind the curtain, users without being aware of it being used. I am not going to go into the proofs rather I would provide you with some insights about them.

Sketch of the article:

Some inequalities

Let X be a non-negative random variable and \alpha >0, then we can say that P(X \geq \alpha) \leq \frac{1}{\alpha}E[X]. This happens because E[X]=\int_{-\infty}^{\infty} x f_X(x) dx, where f_X is pdf of X. Further,

\int_{-\infty}^{\infty} x f_X(x) dx \geq \int_{\alpha}^{\infty} xf_X(x)dx

\geq \int_{\alpha}^{\infty} \alpha f_X(x)dx  =\alpha\int_{\alpha}^{\infty} f_X(x)dx=\alpha P(X \geq \alpha).

This is what we wanted to prove. Now based on this inequality, we can derive Markov’s inequality and Chebyshev’s inequality.

Markov’s inequality: P(|X| \geq \alpha) \leq \frac{1}{\alpha^k}E[|X|^k],

the proof for this is essentially similar to the above case, only thing that changes here is that we need to evaluate E[|X|^k]. Let me for a brief moment say something about the brilliant Russian mathematician Andrey Markov who interestingly had a brother who was also a mathematician and his son too, was a mathematician. He worked mostly in the probability theory.

Chebyshev’s inequality: P(|X-E[X]| \geq \alpha) \leq \frac{Var(X)}{\alpha^2},

now here you can easily make changes from preceding inequality to arrive at this position. Like A. Markov, Pafnuty Chebyshev was also a Russian mathematician, in fact he was a teacher of Markov and he is also regarded as the founding father of the Russian mathematics. He worked in the number theory and the probability theory.

Brief discussion on types of convergence

We are going to talk about the convergence of a sequence of random variables. We would be discussing three such types, one is the convergence in the probability, the second is the almost sure convergence and the third is going to be the convergence in the distribution.

A sequence of random variables, X_1,X_2,..., converges in probability to a random variable X if ,

for every \epsilon >0, \lim_{n \to \infty}P(|X_n-X|\geq \epsilon)=0 or, equivalently, \lim_{n \to \infty} P(|X_n-X|<\epsilon)=1.

The convergence in probability simply means that the probability of difference becomes smaller and smaller with increasing n.

A sequence of random variables, X_1,X_2,..., converges almost surely to a random variable X if ,

for every \epsilon >0, P(\lim_{n \to \infty}|X_n-X|< \epsilon)=1.

The almost sure convergence means that the set where convergence fails is of measure zero, i.e. probability is zero for that set, that is, P(\{\omega \in \Omega : X_n(\omega) \nrightarrow X(\omega)\})=0. For the novice reader convergence in probability and almost sure convergence might look the same but that interchange of limit within and outside probability matters, you can not interchange their position. The readers who want to get into more details can look into Probability and Measure by Patrick Billingsley and A Probability Path by Sidney Resnick.

A sequence of random variables, X_1,X_2,..., converges in distribution to a random variable X if

\lim_{n \to \infty}F_{X_n}(x)=F_X(x), where F_{X_n} and F_X are distribution functions of X_n and X respectively.

Before giving you some examples let me make an statement that almost sure convergence implies the convergence in probability, and the convergence in probability implies the convergence in the distribution, so almost sure convergence is the strongest type of convergence among these while convergence in distribution is the weakest type. In fact, convergence in the distribution is completely different type of convergence, in this case we do not care for the convergence of random variables instead of their distribution functions. It is also important because sometimes it is much more desirable to know about distributions from the observed samples and that is the crux of the elegant Central limit theorem.

Example: Consider a sequence of random variables X_1,X_2,... defined on [0,1] by uniform distribution such that X_1(x)=I_{[0,1]},X_2(x)=I_{[0,1/2]},X_3(x)=I_{[1/2,1]},

X_4(x)=I_{[0,1/3]},X_5(x)=I_{[1/3,2/3]},X_6(x)=I_{[2/3,1]}, and so on.

You can observe that for any value of x, X_n(x)\nrightarrow 0 so it is not an almost sure convergence but it converges in probability to zero because the probability is decreasing in each case. Now if we define the sequence as X_i(s)=s+s^i and X(s)=s, then X_n \rightarrow X for all values except for 1, so it is almost sure convergence as the convergence fails for only one point which obviously has zero probability.

What do we mean by iid random sample?

The random variables X_1,X_2,...,X_n are called independent and identically distributed (iid) random variables with pdf or pmf f(x) if X_1,X_2,...,X_n are mutually independent random variables and the marginal pdf or pmf of each X_i is the same function f(x).

Law of large numbers

There are two forms of Law of large numbers, weak and strong. The Strong law of large numbers implies the Weak law of large numbers because in the Weak law the convergence is in the probability while for the Strong law the convergence is almost sure convergence. The proof of the Weak law follows almost immediately from Chebyshev’s inequality while the proof for the Strong law is not that simple. So what are these laws? In simple words, they say that the sample mean converges to the population mean for large enough samples, and according to their convergence we have the weak or the strong law. Mathematically, let X_1,X_2,...,X_n,... be iid random sample with E[X_i]=\mu and Var(X_i)=\sigma^2 both are finite for all i. Let s_n=\sum_{i=1}^{n}X_i. Then for every \epsilon >0,

Weak law of large numbers (WLLN): \lim_{n \to \infty} P(|\frac{s_n}{n}-\mu|<\epsilon)=1

Strong law of large numbers (SLLN): P(\lim_{n \to \infty}|\frac{s_n}{n}-\mu|< \epsilon)=1

The beauty of these theorems is that they give an interpretation of probability in frequentist terms, that is, consider Bernoulli trial with p as the probability of success, then for n independent trials where n is large we can approximate the expectation of binomial distribution because sum of n independent Bernoulli trials gives us the number of successes in n trials, say it is m, by WLLN we get that \frac{m}{n}~p meaning m~np, which is the expectation in the case of binomial distribution.

Central limit theorem

This theorem abbreviately known as CLT deals with the convergence in distribution for the iid random samples. This theorems is used in almost all of the hypothesis testing problems where we do not know the distribution but have some large sample. It says that the distribution of the sample mean converges to the normal distribution for large enough samples. Now suppose X_i‘s are similar to preceding paragraph, and let G_n(x) denote the cumulative distribution function (cdf) of \sqrt(n)(\frac{s_n}{n}-\mu)/ \sigma. Then \lim_{n \to \infty}G_n(x)=\int_{-\infty}^{x}\frac{1}{\sqrt(2\pi)}e^{-y^2/2}dy, that is, \sqrt(n)(\frac{s_n}{n}-\mu)/ \sigma converges in distribution to standard normal distribution.

Consider the following histogram of random samples of uniform distribution for different sample mean calculated for sample sizes 1,2,5,10,20,and 30, where we can clearly see for large samples normal distribution is a good fit establishing CLT.

References:

Probability and Measure by Patrick Billingsley

Statistical Inference by George Casella and Roger Berger

One response to “Some Limit theorems in Probability (P-7)”

  1. Some Continuous Distributions-I (P-8) – Learn, think and do Avatar

    […] tool for binomial distribution, later which became Laplace-de Moivre theorem, an initial version of the central limit theorem. Also the great Gauss studied it as he was studying the movements of the planets and he figured out […]

    Like

Leave a comment

Navigation

About

Mostly about Math-Stats, Finance, Data Science, Artificial Intelligence(AI), and their combination with some random stuff here and there. Happy Learning and enjoy Randomness !