Learn, Think & Do

Enjoy Randomness !

,

The Conditional Probability (P-3)

What is the conditional probability?

It is simply the calculation of probability done under the knowledge of the some subset (or event) of the given sample space, hence it is called conditional probability that is probability calculated given some event was modified to be the new sample space. So let’s say A,B are two events of a sample space S such that P(B)>0, then the conditional probability of A given B denoted as P(A|B). Mathematically it is defined as P(A|B) = \frac{P(A \cap B)}{P(B)}.We also write A \cap B as AB. This intuitively makes sense as we are looking for probability of event A under the event B as the new sample space. For example, consider a fair die is rolled twice. Let A be the event that the sum of both the rolls is an even number and let B be the event that at least one odd number appears after both the rolls. It is easy to calculate that P(A) = 0.5 and P(B) = 0.75. P(AB) is the probability when both the events happen simultaneously, in this case, sum is even and at least one odd means both the numbers on the rolls must be odd, so P(AB) = 0.25. Now let’s try to find the probability of A given the event B that is, P(A|B) = P(AB)/P(B) = 0.25/0.75 = 1/3. Similarly we can find the probability of B given the event A that is, P(B|A) = P(AB)/P(A) =  0.25/0.50 = 1/2. This also shows that P(A|B) and P(B|A) may not be equal.

Further, suppose P(A), P(B) >0, then we have P(AB) = P(A|B)P(B) = P(B|A)P(A). Also we have, P(A|B) = P(B|A)P(A)/P(B) and P(B|A) = P(A|B)P(B)/P(A) . This is also known as Bayes’ Theorem or formula named after Thomas Bayes. The probability theory is one of those branches of mathematics in which experts also make blunders because of counterintuitiveness of the subject, this is more apparent in the conditional probability problems as we are going to see in the following examples.

Certain examples

Let’s first consider a problem of tossing a fair coin twice. Now suppose we want to evaluate the conditional probability of the event in which we get head on both the tosses given that at least one head appears. Most people think it like this: we know we have one head for sure, so the only choice is either a head or a tail so the probability is 1/2. This is not the right answer for this question this is actually the answer for another related problem that is, it is the conditional probability of getting two heads given there is a head on first toss, in this case above reasoning is right. Here our sample space call it S is \{(HH),(HT),(TH),(TT)\}(also we make an inherent assumption that each event is equally likely). Now let A denote the event of getting heads on both the toss, B denote the event of getting at least one head. What we want to calculate is, P(A|B), which is equal to \frac{P(AB)}{P(B)}. But we know P(AB) = P(A) = 1/4 and P(B) = 3/4, we have at least one head in three cases out of four. So P(A|B) = 1/3, not 1/2.

Now consider the notorious Monty Hall problem. Suppose you are in a game show where you have to choose a door among three doors which contains a prize in the form of a new car, another door is empty and let’s say one contains something like a dozen eggs just to make it interesting, eggs are normal ones not with some magical powers. Suppose there are numberings on each door as 1,2, and 3. Of course you want to win the brand new car, host asks you to choose one door, and you choose let’s say door number 2. Now after you have chosen door number 2, the host opens the door number 3 behind which there are eggs, and you are given a choice to switch the door. What should you do? switch or not switch? Yes, this is a probability problem. And this problem is so much counterintuitive that when it was published in the Parade magazine in 1990 in the column called ‘Ask Marilyn’ and after being offered a solution by Marilyn vos Savant about 10,000 readers including 1000 with PhDs, wrote to magazine claiming that Marilyn was wrong. To calm you down she said that one must always switch. Before coming to her reasoning let’s see how most people approach the problem. So if you have chosen the door number 2 and shown that door number 3 has eggs, you actually become confident by the reasoning that when you chose the door number 2 it had a 1/3 chance of having the car but after it was revealed to you that the door number 3 did not contain the car, you argue there are only two doors remaining so probability is 1/2 now, better than 1/3 previously. This is the correct answer for the current scenario but the question is about whether the strategy of switching or not switching is better. So if you have chosen door 2 then you are going to lose by switching only when the door 2 contains the car which is 1/3 and by switching your chances of winning is 2/3; same goes if you had chosen some other door and employ the strategy of switching then your chances of winning is 2/3. This is very common in the probability problem to confuse one problem for the another problem. For the complete and detailed solution see this blog.

Law of Total Probability

Consider the sample space S and let A_1,A_2,...,A_m be events in S that form a partition of S, meaning that \cup A_i=S and A_i\cap A_j=\emptyset for every i\neq j. Let A be another event in S. Then we can write P(A)=P(A\cap A_1)+P(A\cap A_2)+...+P(A\cap A_m)=P(A|A_1)P(A_1)+...+P(A|A_m)P(A_m). This is called as the law of total probability in discrete finite case but the same is true when you have countable partition of the space. Using this Bayes’ formula can be written as P(A|B) = \frac{P(B|A)P(A)}{P(B|A^c)P(A^c)+P(B|A)P(A)} where A^c is the complement of the event A.

Let’s look at a problem on disease testing using conditional probability: (it is all assumption here for the sake of explaining not based on real testing) Suppose certain test for corona virus is known to be 95% accurate, that is, the test can correctly identify person affected with corona in 95% cases and gives false positives (someone tested positive but do not have the virus) or false negatives (someone having virus yet tested negative) in only 5% cases. Assume the population of the town is 2 million and the number of persons affected with corona virus is 20,000, we are not concerned here how the virus spreads, people at high risk or low risk etc. Now suppose a person is tested positive, what is the probability that the person actually has the virus?

H denotes the event that the person is healthy, C denotes the event that person has the corona virus, T>0 and T<0 represent test results for positive and negative respectively. We want to calculate the probability of a person having corona virus given that the person is tested positive, that is, P(C | T>0). We have P(H) = \frac{1980000}{2000000}=0.99 and P(C)=\frac{20000}{2000000}=0.01. Also given to us is, P(T>0 | C)=0.95 and P(T>0 | H)=0.05. Similarly, P(T<0 | C)=0.05 and P(T<0 | H)=0.95. We want to compute, P(C | T>0)=\frac{P(T>0|C)P(C)}{P(T>0)}, Bayes’ formula. Also by the law of total probability, P(T>0) = P(T>0|C)P(C)+P(T>0|H)P(H)=0.95*0.01+0.05*0.99=0.059. Therefore P(C|T>0)=\frac{0.95*0.01}{0.059}=0.161, so the person tested positive without other consideration has about 16% chance of actually having the corona virus. See for persons at high risk, for them the calculation should be done on the basis of the population of their group for example let’s say there are about 50,000 health workers and say 5000 of them are affected by the virus, in this scenario answer is about 68%.

References

Leonard Mlodinow, The Drunkard’s Walk, Penguin (It explains lots of probability problems and concept in easier and fun manner.)

Sheldon Ross, A First Course in Probability, Ninth edition, Pearson (An introductory texts in probability with lots of examples, good for beginners.)

Leave a comment

Navigation

About

Mostly about Math-Stats, Finance, Data Science, Artificial Intelligence(AI), and their combination with some random stuff here and there. Happy Learning and enjoy Randomness !