6

I have the following code:

import numpy as np
import pandas as pd
obs = pd.DataFrame({
        'storm': [1, 1, 1, 1, 0, 0, 0, 0], 
        'lightning': [1, 1, 0, 0, 1, 1, 0, 0], 
        'thunder': [1, 0, 1, 0, 1, 0, 1, 0],
        'p': [0.20, 0.05, 0.04, 0.36, 0.04, 0.01, 0.03, 0.27]
    })
g1=obs.groupby(['lightning','thunder']).agg({'p':'sum'})
g2=obs.groupby(['lightning','thunder','storm']).agg({'p':'sum'})

which gives

enter image description here

Now how to divide more detailed groupby by less detailed (to calculate percentage)?

I have read this Pandas percentage of total with groupby but was unable to derive how to rewrite for my case.

Community
  • 1
  • 1
Dims
  • 47,675
  • 117
  • 331
  • 600
  • What is P and what would the percentage be? Conceptually this doesnt seem to make any sense. How can you divide one dataframe by another if the index is not the same? Perhaps more information is needed in order to answer the question. – Woody Pride Jun 28 '16 at 19:41
  • Why index is not the same? These are conditional probabilities. Thus, the probability of no lightning and no thunder is 63%. Having this true, the probability of no storm is (27/63) and the probability of yes storm is (36/63). – Dims Jun 28 '16 at 19:45

1 Answers1

9

g2.unstack() to get last level into columns. Then divide, broadcasting over columns. Then stack again.

g2.unstack().div(g1.p, axis=0).stack()

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624