Simpson's Paradox

The concept

When comparing trends in large chunks of data, it’s easy to arrive at the wrong conclusions when you don’t understand the underlying data in full context. When a segment makes up a bigger piece of the pie in one group vs. another, that segment can over-influence the aggregate results. If you are unaware of this over-influence, you could make the wrong decision for your business.

The example

Let’s say you are deciding on two types of advertising for a nationwide promotion coming up. To aid in your decision, you run each ad in a test city and monitor performance for 5 days.

After the 5 days are over, you put together a basic data table by day and by city, to compare conversion rates:

DayCity A visitsCity A conversionCity B visitsCity B conversionCity A vs City B
15010.00%609.50%+5.3%
27510.00%909.50%+5.3%
3 (holiday)2005.00%1504.75%+5.3%
47510.00%909.50%+5.3%
57510.00%909.50%+5.3%
——————————————————————————————————————
Totals4757.89%908.02%-1.5%

You notice that when you compare the totals for each city, city A has a 1.5% lower conversion rate than city B, but when you review the performance by day, city A wins by 5.3% every single day. So which city had the better results and which ad should you choose?

The math

If you look at the visits by day, you can see that on day 3, city A had 200 visits, which represents 42% of the total visits for city A in the 5 days. City B had 150 visits on that day, which represents 31% of the total visits. So even though both cities had much lower conversion than normal on day 3, and city A had a 5.3% higher conversion than city B, the impact of that bad day on city A’s totals was much higher than on city B’s totals because the bad day was over-represented in city A relative to city B.

If you had chosen the ad that ran in city B because of the overall better performance, you potentially would have given up the 5.3% additional orders you would have gained had you chosen city A.

Conclusion

Make sure to dig a bit deeper and review your data across a few different (and relevant) segments, to ensure there aren’t inherent biases in the data driven by differences in mix. It can make a substantial difference in your outcomes and prevent big surprises down the line.

Note - article was ported from a deprecated version of this blog