1

I have a dataframe that looks like this : image of dataframe

I want not repetition in of rows at leid, cp_id, run_seq, prod, curr level and I want to aggregate by sum tran_amnt. I did something like this, but it throws an error.

data_gb = data.groupby('le_id', 'cp_id', 'run_seq', 'products', 'currency').sum()

but it throws this error:

No axis named cp_id for object type

I think this is not working because there are strings present in the dataframe.

Is there any other, better way I can achieve what I want?

Edit:

Here is the code that produces the above dataframe:

import pandas  as pd
data = {'le_id' : [101]*4 + [102]*4 + [103]*3 + [104]*5 + [101],
        'run_seq' : [31]*11 + [32]*6,
        'cp_id' : [201, 201, 201, 201, 203, 204, 205, 205, 206, 208, 
                  209, 202, 201, 204, 205, 208, 208],
        'cp_name' : ['A', 'A', 'A', 'A', 'B', 'C', 'E', 'E', 'F', 
                   'G', 'H', 'B', 'A', 'D', 'E', 'H', 'H'],

        'products' : ['U', 'U', 'U', 'W', 'X', 'U', 'U', 'V', 'W', 
                     'X','U', 'U', 'V', 'W', 'X', 'Z', 'U'],

        'tran_amnt' : [10203, 13789, 74378, 47833, 40237, 93732, 
                      63738, 42563, 92822, 11276, 63633, 99292, 27892, 
                      82727, 32442, 55622, 43535],


        'currency' : ['USD', 'YEN', 'USD', 'SGD', 'USD', 'INR', 'INR', 
                'SGD', 'USD', 'INR', 'SGD', 'SGD', 'SGD', 'SGD', 
                'INR', 'INR', 'INR']}

         data = pd.DataFrame(data)
charlesreid1
  • 4,360
  • 4
  • 30
  • 52
Akash Dubey
  • 304
  • 1
  • 11
  • @anky_91 Its not a dupe. If you think its a dup. Please provide a link to the dupe question. – Akash Dubey Dec 28 '19 at 07:35
  • check the link in the dupe in the question : https://stackoverflow.com/questions/39922986/pandas-group-by-and-sum – anky Dec 28 '19 at 07:36

1 Answers1

2

Try this:

data_gb = data.groupby(['le_id', 'cp_id', 'run_seq', 'products', 'currency']).sum()
oppressionslayer
  • 6,942
  • 2
  • 7
  • 24