Source: unsplash.com

Introduction to Power Analysis in Python

Learn the importance of concepts such as significance level, effect size, statistical power and sample size

Eryk Lewinson
7 min readNov 24, 2018

--

Nowadays, many companies — Netflix, Amazon, Uber, but also smaller — constantly run experiments (A/B testing) in order to test new features and implement those, which the users find best and which, in the end, lead to revenue growth. Data scientists’ role is to help in evaluating these experiments — in other words — verify if the results from these tests are reliable and can/should be used in the decision-making process.

In this article, I provide an introduction to power analysis. Shortly speaking, power is used to report confidence in the conclusions drawn from the results of an experiment. It can also be used for estimating the sample size required for the experiment, i.e., a sample size in which — with a given level of confidence — we should be able to detect an effect. By effect one can understand many things, for instance, more frequent conversion within a group, but also higher average spend of customers going through a certain signup flow in an online shop, etc.

Firstly, I introduce a bit of theory and then carry out an example of power analysis in Python. You can find the link to my repo at the end of the article.

Introduction

In order to understand the power analysis, I believe it is important to understand three related concepts: significance level, Type I/II errors, and the effect size.

In hypothesis testing, significance level (often denoted as Greek letter alpha) is the probability of rejecting the null hypothesis (H0), when it was in fact true. A metric closely related to the significance level is the p-value, which is the probability of obtaining a result at least as extreme (a result even further from the null hypothesis), provided that the H0 was true. What does that mean in practice? In the case of drawing a random sample from a population, it is always possible that the observed effect would have occurred only due to sampling error.

The result of an experiment (or for example a linear regression coefficient) is statistically significant when the associated p-value is smaller than the chosen alpha…

--

--

Data Scientist, quantitative finance, gamer. My latest book - Python for Finance Cookbook 2nd ed: https://t.ly/WHHP