1

How do I multiply a column in a MultiIndex dataframe with several scalars (from another dataframe)?

With normal dataframes it's fairly straightforward, but I get confused when it gets to MultiIndex dataframes. Any recommendations?

Dummy data:

import pandas as pd
import numpy as np

def mklbl(prefix, n):
    return ["%s%s" % (prefix, i) for i in range(n)]

miindex = pd.MultiIndex.from_product([mklbl('C', 4), mklbl('D', 2)])
micolumns = pd.MultiIndex.from_tuples([('Baseline', 'Electricity'), ('Baseline', 'Gas'), ('Consumption', 'Electricity'), ('Consumption', 'Gas')])
df = pd.DataFrame(np.arange(len(miindex) * len(micolumns))
                        .reshape((len(miindex), len(micolumns))),
                         index=miindex,
                  columns= micolumns)

cost = pd.DataFrame([['Electricity', 0.12],['Gas', 0.03]], columns=['fuel__name', 'cost_per_unit'])

Expected output is the dummy data multiplied by the corresponding cost_per_unit for Electricity and Gas, (i.e. cost of energy use)

Chris
  • 1,287
  • 12
  • 31

1 Answers1

1

Map the 1st level of the columns to the values and multiply:

df_res = df*df.columns.get_level_values(1).map(cost.set_index('fuel__name').cost_per_unit)
print(df_res)

         Baseline       Consumption      
      Electricity   Gas Electricity   Gas
C0 D0        0.00  0.03        0.24  0.09
   D1        0.48  0.15        0.72  0.21
C1 D0        0.96  0.27        1.20  0.33
   D1        1.44  0.39        1.68  0.45
C2 D0        1.92  0.51        2.16  0.57
   D1        2.40  0.63        2.64  0.69
C3 D0        2.88  0.75        3.12  0.81
   D1        3.36  0.87        3.60  0.93

Had you just needed to multiply specific subsets, then you could slice the MultiIndex in various methods, and perform actions on only the subsets.

import pandas as pd
df.loc[:, df.columns.get_level_values(1) == 'Electricity']

         Baseline Consumption
      Electricity Electricity
C0 D0           0           2
   D1           4           6
C1 D0           8          10
...

Perhaps more succinct and readable with the use of pd.IndexSlice

idx = pd.IndexSlice
df.loc[:, idx[:, 'Electricity']]

         Baseline Consumption
      Electricity Electricity
C0 D0           0           2
   D1           4           6
C1 D0           8          10
...
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • This is great! Thank you so much. Pandas is so flexible, but at the same time quite tricky with its numerous methods. – Chris Feb 06 '19 at 10:48