1

I have a pandas dataframe full of data

import pandas as pd
import numpy as np

varNames = ["point1","point2","point3","point4","point5"]
df = pd.DataFrame(np.random.randn(5,2),index=varNames,columns=["data1","data2"])

and I would like to create a series with a multiIndex created from this. The index I can do:

iterables=[["point1","point2","point3"],["point4","point5"]]
index=pd.MultiIndex.from_product(iterables, names=['numerator', 'denominator'])

I don't know how to fill the series though. I'm after something like

s = pd.Series(max(df.loc[index["numerator"]]/df.loc[index["denominator"]]),index=index)

I want to take each row in the first dataframe that's listed as a numerator, and divide it by each row in the first dataframe that's listed a denominator, find the maximum value from the resulting line of values and store it in the relevant place (s[variableN,variableM]) in the series.

This is my first time using this multi-index thing, and short of going through the series line by line, working out the value and storing it, similar (I think, I don't think I've been able to fully understand this yet) this, I can't figure out how to do this.

Ben
  • 401
  • 2
  • 6
  • 15

1 Answers1

0

You can use reindex with parameter level with max:

df3 = df.reindex(index, level=0).div(df.reindex(index, level=1)).max(level=0)

Sample:

np.random.seed(456)
varNames = ["point1","point2","point3","point4","point5"]
df = pd.DataFrame(np.random.randn(5,2),index=varNames,columns=["data1","data2"])
print (df)
           data1     data2
point1 -0.668129 -0.498210
point2  0.618576  0.568692
point3  1.350509  1.629589
point4  0.301966  0.449483
point5 -0.345811 -0.315231

iterables=[["point1","point2","point3"],["point4","point5"]]
index=pd.MultiIndex.from_product(iterables, names=['numerator', 'denominator'])

df1 = df.reindex(index, level=0)
print (df1)
                          data1     data2
numerator denominator                    
point1    point4      -0.668129 -0.498210
          point5      -0.668129 -0.498210
point2    point4       0.618576  0.568692
          point5       0.618576  0.568692
point3    point4       1.350509  1.629589
          point5       1.350509  1.629589

df2 = df.reindex(index, level=1)
print (df2)
                          data1     data2
numerator denominator                    
point1    point4       0.301966  0.449483
          point5      -0.345811 -0.315231
point2    point4       0.301966  0.449483
          point5      -0.345811 -0.315231
point3    point4       0.301966  0.449483
          point5      -0.345811 -0.315231

print (df1.div(df2))
                          data1     data2
numerator denominator                    
point1    point4      -2.212594 -1.108405
          point5       1.932062  1.580459
point2    point4       2.048493  1.265214
          point5      -1.788768 -1.804050
point3    point4       4.472386  3.625472
          point5      -3.905339 -5.169509

df3 = df.reindex(index, level=0).div(df.reindex(index, level=1)).max(level=0)
print (df3)
              data1     data2
numerator                    
point1     1.932062  1.580459
point2     2.048493  1.265214
point3     4.472386  3.625472


df3 = (df.reindex(index, level=0).div(df.reindex(index, level=1))
        .max(level=0)
        .reindex(index, level=0))
print (df3)
                          data1     data2
numerator denominator                    
point1    point4       1.932062  1.580459
          point5       1.932062  1.580459
point2    point4       2.048493  1.265214
          point5       2.048493  1.265214
point3    point4       4.472386  3.625472
          point5       4.472386  3.625472
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • That ... almost works. That returns a dataframe with indexed only by the numerators. I'm after one that is multiIndexed by a numerator/denominator combo. – Ben Oct 13 '17 at 05:16
  • That works perfectly, thanks. Now all I need to do is figure out what's going on :) – Ben Oct 13 '17 at 06:07