3

I am trying to do some statistical analysis of different A/B tests to see which alternative is better and have found conflicting information about this.

First, I am interested in a couple different things:

  • Tests that measure success by counting events, such as conversions or emails sent
  • Tests that measure success by counting revenue
  • Tests that have only two alternatives (control and new)
  • Tests that have multiple alternatives (control and multiple new)

I was hoping to find a simple set of formulae or rules for doing this analysis but have found more questions than answers.

This site says that you can't compare multi-alternative tests; you can only do pairwise comparisons and do a chi-squared analysis to see if the whole test is statistically significant or not.

This site Suggests a way to do A/B/C/D testing (starts on slide 74), analysing the results using the G-Test (which it says is related to chi-squared) but isn't clear on the details of using a fudge factor. It also suggests that you can only use the A/B/C/D approach to eliminate alternatives until you end up with a clear winner in an A/B comparison.

This site gives an example of an A/B/C/D test (including control) and shows how to compare the conversion rate to determine a winner. Unlike this approach it does not recommend eliminating alternatives but rather picks a winner right off the bat (Assuming statistically significant results).

Perhaps I'm naive but I would think that by now a stats analysis library would exist to deal with this very problem. I would also appreciate more information about what algorithms/equations are needed to solve these problems. It's been a long time since my university Stats class.

MattBagg
  • 10,268
  • 3
  • 40
  • 47
Mr. Shiny and New 安宇
  • 13,822
  • 6
  • 44
  • 64

1 Answers1

1

For the event generating comparison, you could approach this using Beta distributions. Each alternative has some unobserved p, the probability of producing an event. If you observe X positive events out of N, then your uncertainty about p can be modeled by Beta(X+1,N-X+1).

You can compare two alternatives by looking at P(pA > pB), where pA and pB are the two Beta distributions. Methods for computing that inequality probability can be found in this paper.

You can also compute E[pA-pB], the effect size, or compute confidence bounds of the same.

  • Also, you might watch this blog (the next post is supposed to be on this subject): http://sirevanhaas.com/?p=30 –  Dec 24 '09 at 00:40
  • And you might read chapter 37 of this book: http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Individual chapters are available here: http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/ –  Dec 24 '09 at 00:48