1

Given a DataFrame A, I want to sum the columns in the same category, and put the result in new columns in A_modified.

A=
  location     exp1    exp2    data1    data2
0 FL           100     20      30       10
1 NC           40      30      50       60

A_modified
  location     exp1    exp2    data1    data2  total_exp    total_data
0 FL           100     20      30       10     120          40
1 NC           40      30      50       60     70           110

I want to do it for multiple DataFrames all having the same columns, what is the best practice to do it? Here is what I did, but I would think that using dictionaries would be better to deal with more columns.

def f(df):
    df['exp_sum']= pd.Series(df.filter(like='exp').sum(axis=1), index = df.index)
    df['data_sum']= pd.Series(df.filter(like='data').sum(axis=1), index = df.index)
    return df
A = f(A)
9000
  • 39,899
  • 9
  • 66
  • 104
Ana
  • 1,516
  • 3
  • 15
  • 26
  • Approximately how many columns do you have? If there are only a few, it might be easier and efficient to do it by hand, like how you are doing. Otherwise, you can get all the column names, strip the numbers from the end (`[name[:-1] for name in df.columns]`) and then use [`sets` to give you unique names](http://stackoverflow.com/questions/12897374/get-unique-values-from-a-list-in-python). You can then loop over the unique names, within your function `f()` (after slight modifications). – Kartik Nov 24 '15 at 19:51
  • Thanks @Kartik, There are around 100 columns, but their names are not necessarily in the name_number format. So I may have `exp1` in one column and `other exp` in another. That's why I thought defining dictionaries might be useful. – Ana Nov 24 '15 at 20:26
  • 1
    I don't see how dictionaries can help, but that may be because I am missing something. If you can share your data, it would be great. I think that `df.filter()` should do what you want. You will need a list of unique names though (`['exp', 'data' ...]`) – Kartik Nov 24 '15 at 20:43

0 Answers0