Basics of probability (P-2)

In the last post about probability, we talked about what probability is, its history and scope in today’s world. In this post we are going to talk about certain basic properties of probability so as to put some light on (not deeply engaging in mathematical technicalities) the firm mathematical foundation of it. The idea here is to reach the concept of expected value or expectation quickly so that I can introduce you to some finance concepts which depend on it. It is also going to open up the way for the conditional probability which in my opinion is the crux of this subject. Some amount of knowledge of sets such as intersection of two sets, union of two sets, their difference would be essential. I am going to address by and large the following questions in this post:

What is a sample space and its events?
What are axioms of probability?
What is a random variable? Expectation? Variance?

Sample Space

In simple language, a sample space is the set of all the possible ‘outcomes’ we can expect of an ‘experiment’. There are two undefined terms which I am going to explain through examples. So consider tossing a coin, this is our experiment with possible outcomes as a head or a tail so $S=\{H,T\}$ . See in mathematics we can assume any kind of experiment which might not make any sense in real life for example one can consider an experiment of tossing a three-sided coin, with two heads and one tail. This also constitutes an example of valid experiment and outcomes. One can also consider a three-sided die with faces 1,2 and 3. Now suppose we roll the die twice then what could be the outcomes? The careful examination reveals that this is same as rolling two dice so the sample space $S$ in this case is all the ordered pairs, that is, $S=\{(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)\}$ . Note that the size or cardinality of the sample space is $9$ , in general for an $m$ -sided die with $k$ throws the size of the sample space is $m^k$ . So for your usual six-sided die rolled once the size of the sample space is $6$ , for the twice case it is $6^2=36$ pairs.

An event $A$ is a possible subset of the sample space $S$ . In sophisticated mathematical language they are the subsets of $S$ which form something known as a sigma algebra of the subsets of $S$ . Mostly we are not going to be concerned with them as we would mostly be dealing with finite sample spaces. But for the sake of completeness let’s define ‘what is a sigma algebra.?’

A sigma algebra $\mathfrak{B}$ is a collection of subsets of $S$ which satisfies following properties:

(The empty set is in the collection) $\varnothing \in \mathfrak{B}$ .
(Closed under complementation) If $A \in \mathfrak{B}$ , then $A' \in \mathfrak{B}$ .
(Closed under countable unions) If $A_1,A_2,\ldots \in \mathfrak{B}$ , then $\cup_{i=1}^{\infty}A_i \in \mathfrak{B}$ .

You should not get scared of the above definition but these things are necessary when you are going to study high level mathematics, more specifically measure theory and related fields, otherwise you don’t need them. For finite sets we consider all the combination (meaning union and intersection of subsets) of subsets as the sigma algebra, one can easily verify that it is indeed a sigma algebra. For example, if we roll a (usual) die twice then we can define an event $A$ as the set of all outcomes of the throw where sum of the numbers is even. Let’s find the size of $A$ . So the sum is even when both the numbers on the pair are either even or odd, how many such pairs are there? $A=\{(1,1),(1,3),(1,5),(2,2),(2,4),(2,6),(3,1),(3,3),(3,5)$ $,(4,2),(4,4),(4,6),(5,1),(5,3),(5,5),(6,2),(6,4),(6,6)\}$ so we have size of $A$ , $|A|=18$ . Now how can we assign probability to this event $A$ ? If there is no other information given we can assume that each outcome is equally likely, that is each of them has equal probability, as the size of the sample space is $36$ we can say that every pair has probability of $\frac{1}{36}$ . Also we can see that each pair is different from other, that is they are mutually exclusive as a set (meaning $B \cap C = \varnothing$ , they have nothing in common). For more than two sets, let’s say $A_1,A_2,...,A_m,...$ we say that this collection of sets are pairwise disjoint if $A_i \cap A_j =\varnothing$ for every $i \neq j$ . We could add their individual probability to get the probability of their union. Now using this we can say that probability of $A,P(A)=\frac{1}{36}+...+\frac{1}{36}$ , added $18$ times since size of $A$ is $18$ and probability of each outcome is $\frac{1}{36}$ . So $P(A)=\frac{18}{36}=\frac{1}{2}$ . Let’s make this process more general by making them as axioms in the next section.

Axioms of probability

If you are new to axioms shaxioms, let me give you a brief description about their purpose in mathematics. They are basically a combination of what alphabet and grammar is to a language. They contain certain undefined things for example there is no exact definition of what is a set rather we have some agreed axioms which must be satisfied by these sets. Together, the sets and their axioms, form the building block of all of mathematics, which has a very interesting history. Anyway axioms make our study of mathematical objects and relations among them precise and well defined. So let me present here the axioms of probability based on the approach of the mathematician A.N. Kolmogorov.

Given a sample space $S$ and an associated sigma algebra $\mathfrak{B}$ , a probability function is a function P with domain $\mathfrak{B}$ that satisfies

$P(A) \geq 0$ for all $A \in \mathfrak{B}$ .
$P(S)=1$ .
If $A_1,A_2,\ldots \in \mathfrak{B}$ are pairwise disjoint, then $P(\cup_{i=1}^{\infty}A_i)=\sum_{i=1}^{\infty}P(A_i)$ .

Any function $P$ that satisfies the three axioms of probability mentioned above is called a probability function. In mathematical language, a set $S$ with sigma algebra $\mathfrak{B}$ and a probability function $P$ is called as a probability space, denoted by $(S,\mathfrak{B},P)$ . Unless we are dealing with theorems we can usually get away with calling $S$ as a sample space with understood sigma algebra and probability function. In coin toss experiment, we can assign the probabilities to each of the events in many ways, for example if we assume that the two events (Head or Tail) are equally likely then we know by axioms that $P({H})+P({T})=1$ , which give each of them value of half as they are equal, this type of coin is called a fair coin. But we can give different probabilities to these events too as long as they satisfy our axioms of probability there is no issue. So suppose we define $P({H})=0.7$ and $P({T})=0.3$ this is also valid but you can see that such a coin wouldn’t be fair, it would be rigged towards head.

Some properties of probability function

If $P$ is a probability function and $A$ and $B$ are any event of the sample space $S$ then,

$P(\varnothing)=0$ ;
$P(A)\leq 1$ ;
$P(A')=1-P(A)$ , where $A'$ is the complement of $A$ ;
$P(A \cup B)=P(A)+P(B)-P(A\cap B)$ ;
If $A \subseteq B,$ then $P(A)\leq P(B)$ .

Random Variable

A real valued function defined on a sample space $S$ is called a random variable. Although it is called a variable, it is a function, random denotes the randomness in the outcomes of the sample space after we perform an experiment. For example, in the coin toss experiment we can define a random variable as when we get head its value is $1$ , otherwise $0$ . So using a random variable we try to associate a numerical value to each outcome of the experiment. The random variables are further divided into two categories, discrete random variables and continuous random variables. We call a random variable discrete if its range is countable, that is if it is finite or countably infinite (like natural numbers we can arrange them as $x_1,x_2, \ldots, x_m, \ldots$ and so on), otherwise it is called a continuous random variable. When $X$ is a random variable we can define other random variables using $X$ such as $Y=aX+b$ , $Z=X^2$ etc.

Expectation

In this post, we won’t be discussing the continuous random variables let’s leave it for the future topic. Let’s talk about expected value or expectation of a (discrete) random variable. Mathematically, expectation of a random variable is defined as $E[X]:= \sum_{i=1}^{\infty}x_ip(x_i)$ , where $x_i$ ‘s are all the values the random variable $X$ can take and $p(x_i)$ is the corresponding probability. It is simply weighted mean of all the values the random variable takes. It is also called mean or simply average and is denoted by $\mu$ . For example, suppose we define a random variable $Y$ based on a betting game of tossing a fair coin where you win $\$ 2$ when head turns up and you lose $\$ 1$ when tail comes up. So $Y$ can be thought of taking values $+2$ when winning and $-1$ when losing. Now the question is should you take the bet? These kinds of questions are answered by computing $E[Y]$ and if $E[Y]>0$ you should take the bet else not, remember here positive means winning. Let’s calculate, $E[Y]= \frac{1}{2} (2)+\frac{1}{2}(-1)=\frac{1}{2}>0$ , so you should take the bet and what we mean by this is on average you would win $\$ 0.5$ .

We can also define the expectation of any random variable, say $Y$ defined using another random variable, say $X$ , for example $Y=X^2$ . for this $E[Y]=E[X^2]=\sum_{i=1}^{\infty}x_i^2p(x_i)$ . In general, for any function $g(X)$ we can define $E[g(X)]=\sum_{i=1}^{\infty}g(x_i)p(x_i)$ . Using the definition of expectation one can easily show that $E[aX+b]=aE[X]+b$ , where $a,b \in \mathbb{R}$ are constants, $X$ is a random variable (Hint: use the fact that $\sum p(x_i)=1$ ).

Variance

There is another very useful concept associated with expectation which is known by the name of ‘variance’. It simply measures the spread of the given random variable from the mean or the expectation. Mathematically, $Var(X):= E[(X-\mu)^2]$ , is the definition of variance of $X$ , where $\mu =E[X]$ . Let’s calculate the variance from the preceding example, we know that $\mu=0.5$ , therefore $Var(X)=E[(X-0.5)^2]=E[X^2-X+0.25]=E[X^2]-E[X]+0.25=2.5-0.5+0.25=2.25$ , where $E[X^2]=(2^2)(0.5)+(-1)^2(0.5)=2+0.5=2.5$ . The square root of variance of a random variable $X$ is known as standard deviation, and is denoted by $\sigma = \sqrt{Var(X)}$ .

References

[2] is the book for the beginners with plenty of examples, [1] is slightly more mathematical but a good introductory text in Statistics, and [3] is a good introduction to probability for mathematically oriented learners.

One response to “Basics of probability (P-2)”

The Prospect Theory (BE-1) – Learn, think and do

August 18, 2021

[…] people, you chose the second option, the sure option of getting $46 even though the expected value (for the expected value look at this blog) of the first option is higher ($50). According to the Expected Utility Theory, the rational choice […]

LikeLike

Learn, Think & Do