Berkson's Paradox

The current day lifestyle has gained prominent importance of the word ‘Paradox’. It is an 

explanation that appears to be self-opposing or senseless, yet which may incorporate an idle truth. There exist numerous paradoxes in the 21st century and they allow us to continue expanding our understanding outside of what we feel comfortable with.

Of all the available paradoxes in mathematics and statistics, we chose Berkson’s paradox because we found some similarity between the paradox and our real life situations. We make a lot of decisions without being aware of the assumptions linked to the situation and the Berkon’s paradox provides us with the explanation for making particular assumption. 

 

Berkson’s Paradox 

 

The paradox was given by Joseph Berkson in 1946. Joseph Berkson, a physicist by profession came up with a selection bias for determining causal risk factors associated with a disease through case control studies. As per Berkson paradox, instances where two items which seem correlated to the general people are actually not correlated in reality. In statistical terms it means that even when two values are statistically negatively correlated it may seem that they are positively correlated. 

Joseph Berkson in his first paper demonstrated a bias in which unlinked diseases seem spuriously linked in different manners. Berkson’s study involved association between diabetes and cholecystitis as perceived by his inpatients. Even though both the diseases were independent, most of the patients believed a positive relation between them, hence leading to misleading conclusions. However the real reason for this misleading conclusion as per Berkson was the fact that the people with many diseases are more likely to be hospitalised than people with less diseases. This meant that the data was only collected from the people who suffered from many diseases and therefore these patients easily associated diabetes with cholecystitis. Based on this analysis Berkson tried to defy the causal relationship between smoking and lung cancer, however due to its large criticism by the medical society the paradox though correct for diabetes and cholecystitis lost its momentum. 

However the paradox regained its momentum after 40 years when a series of papers by David Sackett provided real life examples in support of the paradox. These papers encouraged counter-intuitive studies in the same domain. As a result various other studies with special focus on health and medicine came up with disease to disease association and risk factor. The recent studies not only focusing on health, provides examples associating the characteristics of individuals as well. These studies revealed that the reason such associations happen primarily is because of the fact that observations are not taken equally from both the cases equally or in many cases observations are absent only. Thus it can be concluded that the reason why people think wrongly is because there are mistakes in the observations made by them, i.e they generally do not make equal observations for both the cases and therefore make misleading interpretations.

Source - Dailymail

Let us understand this with an example. Have you ever heard women complain that all the good looking guys are jerks and all the nice guys are ugly? Chances are that she has fallen victim to a statistical fallacy called the Berkson's Paradox. When it comes to selecting a partner, we all have criteria that means the most to us. Some people care more about looks, some people care more about money, and some people care more about personality. It is a very rare situation in which a person makes a decision based on one criterion only. It should be duly noted that niceness and looks are assumed to be independent variables in men. 

Each woman wants to date a person in the upper right corner of this graph i.e, a man that is both attractive and decent (10/10). However, if a guy is a jerk sometimes, she may in any case date him on the chance that he is really great looking. Similarly, if a person is extremely decent, she may in any case date him even if he lacks the look good category. Therefore, the guy she is willing to date is somewhere around:

Niceness+Handsomeness>some constant value.

From this common trade off conduct in her dating plan, many of the most good looking guys that she dates are not all that nice. Similarly, a considerable lot of the guys with the best personality that she dates are not as handsome. By limiting herself to this arrangement of guys, she sees a negative correlation among looks and personality, in spite of these two factors being independent in the population! This is Berkson's Fallacy, and now you can see this actuated connection originates from selection bias. Her dating model makes her reject the men that have average personalities and are average looking. 

It is important to understand that we can see misleading relationships between factors because of choice bias. It further raises questions on our experiences and data collection strategies. Whether they adequately sample the population and make sound conclusions from our observations.

Mathematical Explanation

If two independent events A and B are given where at least one of them occurs, and if 0 < P(A), P(B) < 1 and P(A|B) = P(A), then

P (A|B, A U B) < P (A|A U B) where

 i.          Event A and B may or may not occur ignoring the case where both A and B do not occur..

 ii.          Event A and B are independent of each other and P(A|B) is the conditional probability of observing event A given that B is true.

iii.          P (A|B, A U B) is the probability of observing event A given that B and (A or B) occurs.

 iv.          P (A|A U B) is the probability of A given (A or B) occurring.

Berkson’s paradox states that two independent events become negatively dependent if only outcomes where at least one of them occurs is considered.

As we have excluded the case where neither of the events occur, the conditional probability of occurrence of event A given that A or B occurs is higher than the unconditional probability of A, i.e.

P (A|A U B) > P(A)

If a sample of 400 is taken and both A and B occur independently, then P(A) = P(B) = 200/400 = ½. Hence in 300 outcomes, either A or B occurs in which A occurs 200 times.

 

A

˜A

B

A & B (100)

˜A & B (100)

˜B

A & ˜B (100)

˜A & ˜B (100)

 

Conditional Probability of A = P (A|A U B) = 200/300 = 2/3

Unconditional Probability of A = P(A) = 200/400 = ½ 

The probability of A given both B and (A or B) is: P (A|B, A U B) = 100/200 = ½.

 

A

˜A

B

A & B (25)

˜A & B (25)

˜B

A & ˜B (25)

˜A & ˜B (25)

 

From the above example, we observe that the probability of A is greater in the subset of outcomes where (A or B) occurs than in the overall sample and the conditional probability of A given that B and (A or B) occurs is equal to the unconditional probability of A in the overall sample. This gives rise to Berkson’s paradox that due to the presence of B in the subset, the conditional probability of A decreases explaining the negative dependence of two independent events given that at least one of them occurs. Hence,

 P (A|B, A U B) = P(A|B) = P(A) and

  P (A|A U B) > P(A)

A Solution to the Paradox

Now that Berkson’s Paradox is explained, various questions arise like, is there a correct way of thinking about the paradox? Or, is there a correct or intuitive way of reasoning it? The answer to these questions is yes. Berkson’s Paradox is a misunderstanding which looks like a paradox because of the flawed data gathering. The answer for any occurence of Berkson’s fallacy is to appropriately define or characterise the population, and afterwards statistically inspect a fair section of the population to check for relationship amongst A and B. If the previous conclusion was that A and B have a negative correlation, but on the other hand the new test shows that the two are unrelated, at that point somebody fell into Berkson’s fallacy and the issue has now been settled. The correct way to reason and hence finding a solution to the fallacy is by knowing about it. Also, whenever a negative correlation of two desirable traits is found, check to see if the sample truly matches the population.

 

References

Woodfine, J., & Redelmeier, D. (2015, April 17). Berkson's paradox in medical care. Retrieved August 05, 2020, from https://onlinelibrary.wiley.com/doi/full/10.1111/joim.12363

Simon, C. (2014, October 05). Berkson's Paradox: Are handsome men really jerks? Retrieved August 05, 2020, from http://corysimon.github.io/articles/berksons-paradox-are-handsome-men-really-jerks/

Stephanie. (2020, June 08). Berkson's Paradox: Definition. Retrieved August 05, 2020, from https://www.statisticshowto.com/berksons-paradox-definition/

Berkson, J. (2014). Limitations of the Application of Fourfold Table Analysis to Hospital Data.*,†. International Journal of Epidemiology, 43(2), 511-515. doi:10.1093/ije/dyu022

Authors - Darshit Agarwal, Abhilasha Anand & Bhavini Saraf


Comments