Learn, Think & Do

Enjoy Randomness !

, ,

The probability distribution function (P-4)

In this post we will be discussing about the probability distribution function for discrete as well as for continuous random variables which would be defined later. The difference is not that much, it is just that integration becomes summation for discrete cases and other way round. And naturally so, this post is going to highlight some discrete and continuous random variables with their respective probability density functions (pdfs) as well as cumulative distribution functions (cdfs). The main non-mathematical motivation behind studying these things is statistical modeling. These are the building blocks of statistical applications in many fields of data science and machine learning as well as in engineering and finance.

The following are the questions we are going to discuss:

  1. What are discrete and continuous random variables?
  2. What are probability distribution and density / mass functions?
  3. Few discrete and continuous random variables with respective distribution functions.

Discrete and Continuous random variables

A discrete random variable X is a function from a sample space \mathfrak{F} of a set \Omega into \mathbb{R} which takes only countable discrete values. Let me make it precise what discrete means here, in simple language it means for every value that the random variable takes there is some jump. A random variable is said to be continuous if it can take any real value within an interval in \mathbb{R} or on the whole \mathbb{R} (with certain conditions which would be mentioned later). For example, if r.v. X takes values as 0 or 1 then it is discrete but if it takes any value between the interval [0,1] then it is continuous. Generally, X, Y, Z etc. denote rv and x,y,z etc. denote their values.

Probability Mass and Distribution function for discrete random variables

For a discrete rv X, the probability mass is defined as P(X=x)=P(\{\omega \in \Omega: X(\omega)=x\}), it is the probability of all the events which is assigned the value x by X. It is called as the probability mass function (pmf) if \sum_{i=1}^{\infty}P(X=x_i)=1 for any discrete rv X taking x_i‘s as values. The simplest example is the rv X which takes a value of 1 when head comes up and a value of 0 when tail shows up for a singe coin toss experiments with probability of head being p. So we have P(X=0)=1-p and P(X=1)=p. The probability mass function can be thought of as the mass of the particle at that point. Another useful concept is the probability distribution function of a random variable X which is defined as P(X \leq x)=P(\{\omega \in \Omega : X(\omega)\leq x\}). We can see that \sum_{i=1}^{\infty}P(X=x_i)=1 for any discrete rv X taking x_i‘s as values, because this is nothing but the probability of the whole sample space. We write the probability mass function as p_X(x)=P(X=x) and the probability distribution function (also called cumulative distribution function) as F_X(x)=P(X \leq x)=\sum_{x_i \leq x}P(X=x_i). The random variable X defined above for the single coin toss with probability of head (or success) as p with the probability mass function defined as p_X(X=\theta)=p^{\theta}(1-p)^{1-\theta} where \theta =0,1, is called as the Bernoulli distribution with parameter p, denoted as Ber(p). It is used in classification problems.

Probability Distribution and density function for continuous random variables

Often in mathematics, first certain specific properties are discovered then based on it a general definition is defined. The same story is true of continuous random variables. So the definition we gave above of a continuous rv is not complete. Let’s make it precise (omitting technicalities involved which may not be of interest to most people, for readers interested in more technical aspects involved search for absolutely continuous functions). We say X is a continuous rv if there exists a non-negative continuous function f defined on \mathbb{R} such that for any interesting subset B\subset \mathbb{R} (for example, intervals and their unions or intersections), we have P(X \in B)=\int_{x\in B}f_X(x) dx. The function f_X is called as the probability density function if \int_{-\infty}^{\infty}f_X(x) dx=1, which is similar to the case of discrete rv where sum of the probability mass function (pmf) is unity. In this case cumulative distribution function (cdf) F_X is given by F_X(x)=P(X \leq x)=\int_{-\infty}^{x}f_X(t) dt. In general when F_X is differentiable everywhere and f_X is continuous then \frac{d}{dx}F_X=f_X, which is what is assumed in most cases of interest or applications. We would be mentioning more properties of the pdf and cdf of a rv as we go along in future blogs. Consider a random variable X which takes values in (a,b) such that a<b, which has pdf f_X(x)=0 for all x>b and x<a and f_X(x)=\frac{1}{b-a} for all a \leq x \leq b. We can see that F_X(x)=\frac{x-a}{b-a}. Then this defines a continuous rv and the distribution defined by it is called as the uniform distribution on the interval (a,b).

Some more random variables

Consider a rv X which gives the number of heads (successes) in n coin tosses where the probability of success is p. We know that this can happen in \binom{n}{k} ways and multiplying it with the probability gives us the number of successes that is, we have p_X(k)=P(X=k)=\binom{n}{k}p^k(1-p)^{(n-k)}. By binomial theorem we know that \sum_{i=0} ^{n}\binom{n}{i}p^i(1-p)^{(n-i)}=1, hence it is a pmf. This pmf defines what is called as the Binomial distribution.

Consider another rv Y which has pdf f_Y(y)=\frac{1}{\sqrt{2\pi}}e^\frac{-y^2}{2}. It is a good exercise to solve this integral over (-\infty, \infty) and see that it is 1. It defines a distribution function which is seen throughout statistics, it is called as the standard normal distribution. We would be discussing this and some other distributions in more detail like their expectation and variance and uses in forthcoming blogs.

References:(If you click from the link provided here and buy books it would help me. Thanks.)

Leave a comment

Navigation

About

Mostly about Math-Stats, Finance, Data Science, Artificial Intelligence(AI), and their combination with some random stuff here and there. Happy Learning and enjoy Randomness !