In this post, I am going to talk about some of the limit theorems which are building blocks of statistical applications but often they work behind the curtain, users without being aware of it being used. I am not going to go into the proofs rather I would provide you with some insights about them.
Sketch of the article:
- Some inequalities
- Brief discussion on types of convergence
- What do we mean by iid random sample?
- Law of large numbers
- Central limit theorem
- References
Some inequalities
Let be a non-negative random variable and
, then we can say that
. This happens because
, where
is pdf of
. Further,
.
This is what we wanted to prove. Now based on this inequality, we can derive Markov’s inequality and Chebyshev’s inequality.
Markov’s inequality: ,
the proof for this is essentially similar to the above case, only thing that changes here is that we need to evaluate . Let me for a brief moment say something about the brilliant Russian mathematician Andrey Markov who interestingly had a brother who was also a mathematician and his son too, was a mathematician. He worked mostly in the probability theory.
Chebyshev’s inequality: ,
now here you can easily make changes from preceding inequality to arrive at this position. Like A. Markov, Pafnuty Chebyshev was also a Russian mathematician, in fact he was a teacher of Markov and he is also regarded as the founding father of the Russian mathematics. He worked in the number theory and the probability theory.
Brief discussion on types of convergence
We are going to talk about the convergence of a sequence of random variables. We would be discussing three such types, one is the convergence in the probability, the second is the almost sure convergence and the third is going to be the convergence in the distribution.
A sequence of random variables, converges in probability to a random variable
if ,
for every ,
or, equivalently,
.
The convergence in probability simply means that the probability of difference becomes smaller and smaller with increasing .
A sequence of random variables, converges almost surely to a random variable
if ,
for every ,
.
The almost sure convergence means that the set where convergence fails is of measure zero, i.e. probability is zero for that set, that is, . For the novice reader convergence in probability and almost sure convergence might look the same but that interchange of limit within and outside probability matters, you can not interchange their position. The readers who want to get into more details can look into Probability and Measure by Patrick Billingsley and A Probability Path by Sidney Resnick.
A sequence of random variables, converges in distribution to a random variable
if
, where
and
are distribution functions of
and
respectively.
Before giving you some examples let me make an statement that almost sure convergence implies the convergence in probability, and the convergence in probability implies the convergence in the distribution, so almost sure convergence is the strongest type of convergence among these while convergence in distribution is the weakest type. In fact, convergence in the distribution is completely different type of convergence, in this case we do not care for the convergence of random variables instead of their distribution functions. It is also important because sometimes it is much more desirable to know about distributions from the observed samples and that is the crux of the elegant Central limit theorem.
Example: Consider a sequence of random variables defined on
by uniform distribution such that
,
, and so on.
You can observe that for any value of ,
so it is not an almost sure convergence but it converges in probability to zero because the probability is decreasing in each case. Now if we define the sequence as
and
, then
for all values except for
, so it is almost sure convergence as the convergence fails for only one point which obviously has zero probability.
What do we mean by iid random sample?
The random variables are called independent and identically distributed (iid) random variables with pdf or pmf
if
are mutually independent random variables and the marginal pdf or pmf of each
is the same function
.
Law of large numbers
There are two forms of Law of large numbers, weak and strong. The Strong law of large numbers implies the Weak law of large numbers because in the Weak law the convergence is in the probability while for the Strong law the convergence is almost sure convergence. The proof of the Weak law follows almost immediately from Chebyshev’s inequality while the proof for the Strong law is not that simple. So what are these laws? In simple words, they say that the sample mean converges to the population mean for large enough samples, and according to their convergence we have the weak or the strong law. Mathematically, let be iid random sample with
and
both are finite for all
. Let
. Then for every
,
Weak law of large numbers (WLLN):
Strong law of large numbers (SLLN):
The beauty of these theorems is that they give an interpretation of probability in frequentist terms, that is, consider Bernoulli trial with as the probability of success, then for
independent trials where
is large we can approximate the expectation of binomial distribution because sum of
independent Bernoulli trials gives us the number of successes in
trials, say it is
, by WLLN we get that
meaning
, which is the expectation in the case of binomial distribution.
Central limit theorem
This theorem abbreviately known as CLT deals with the convergence in distribution for the iid random samples. This theorems is used in almost all of the hypothesis testing problems where we do not know the distribution but have some large sample. It says that the distribution of the sample mean converges to the normal distribution for large enough samples. Now suppose ‘s are similar to preceding paragraph, and let
denote the cumulative distribution function (cdf) of
. Then
, that is,
converges in distribution to standard normal distribution.
Consider the following histogram of random samples of uniform distribution for different sample mean calculated for sample sizes and
, where we can clearly see for large samples normal distribution is a good fit establishing CLT.

References:
Probability and Measure by Patrick Billingsley
Statistical Inference by George Casella and Roger Berger
Leave a comment