2

I'm fairly new to pandas and I am struggling to get the appropriate mean in a multi-index series. The multi-index series currently looks like this;

idx = pd.MultiIndex.from_tuples([('foo', 111), ('foo', 222),
                                 ('bar', 111), ('bar', 222), ('bar', 333),
                                 ('baz', 111),
                                 ('qux', 111), ('qux', 222)],
                                names=['ID', 'Account Number'])

df = pd.DataFrame(index=idx, data={'Service 1': 18, 'Service 2': 22, 'Total cost': 40})
df = pd.concat([df], keys=['Cost'], axis=1)

                        Cost                     
                   Service 1 Service 2 Total cost
ID  Account Number                               
foo 111                   18        22         40
    222                   18        22         40
bar 111                   18        22         40
    222                   18        22         40
    333                   18        22         40
baz 111                   18        22         40
qux 111                   18        22         40
    222                   18        22         40

The tables that all the data is being pulled in from applies cost to service 1 & 2 at Account number level but what it really needs to do is apply the cost at ID level and split the cost across account numbers, so what it should look like is;

                        Cost                      
                   Service 1  Service 2 Total cost
ID  Account Number                                
foo 111                  9.0  11.000000  20.000000
    222                  9.0  11.000000  20.000000
bar 111                  6.0   7.333333  13.333333
    222                  6.0   7.333333  13.333333
    333                  6.0   7.333333  13.333333
baz 111                 18.0  22.000000  40.000000
qux 111                  9.0  11.000000  20.000000
    222                  9.0  11.000000  20.000000

I've tried df.groupby(['ID']).transform('mean') but this gives me the original figures obviously and I'm not sure how to get where I need to.

Feel like I've been round the twist with this, so any help would be appreciated.

ALollz
  • 57,915
  • 7
  • 66
  • 89
Alex Simcock
  • 53
  • 1
  • 4
  • 1
    Can you do , df.to_dict() ? – BENY Jan 13 '20 at 15:57
  • This is a little confusing because we don't see the original dataframe, only the wrong and expected answer. – anishtain4 Jan 13 '20 at 15:59
  • It looks like you have multi-indexes columns _and_ indexes. So ideally, you should post some code for us to replicate your data frame as is - just the printed out dataframe won't cut it in this situation – rafaelc Jan 13 '20 at 16:03

1 Answers1

2

Thanks @ALollz for the edit. Its always helpful to have the full Dataframe constructor code incase there is a multi-index

You can do a groupby on the first level and transform count , then divide:

df.div(df.groupby(level=0).transform('count'))

                        Cost                      
                   Service 1  Service 2 Total cost
ID  Account Number                                
foo 111                  9.0  11.000000  20.000000
    222                  9.0  11.000000  20.000000
bar 111                  6.0   7.333333  13.333333
    222                  6.0   7.333333  13.333333
    333                  6.0   7.333333  13.333333
baz 111                 18.0  22.000000  40.000000
qux 111                  9.0  11.000000  20.000000
    222                  9.0  11.000000  20.000000
anky
  • 74,114
  • 11
  • 41
  • 70