2

I need some help with replacing iterrows in iterating over a Pandas dataframe. I have a Pandas dataframe like this one:

| cust_no | channel  | month1 | month2 |
|   1     | radio    | 0.7    | 0.4    |
|   1     | fb       | 0.1    | 0.5    |
|   1     | tv       | 0.2    | 0.1    |
|   2     | fb       | 0.5    | 0.25   |
|   2     | radio    | 0.4    | 0.25   |
|   2     | tv       | 0.1    | 0.5    |

I need the channel that has the maximum value for each month when grouped by cust_no, and join them as a string in a new column in the same dataframe. So, for example, from above dataframe:

In Customer 1's case radio has max value in month1, but fb has the max value in month 2, so I need this string: radio>fb

In customer 2's case fb has max value in month1, but tv has the max value in month2, so I need this strong: fb>tv

Appreciate any help. Thanks. Performance is really important

Ahsan
  • 551
  • 2
  • 5
  • 18
  • 1
    [OBLIGATORY WARNING](https://stackoverflow.com/a/55557758/4909087), don't use `iterrows` to iterate over a dataframe. – cs95 Apr 12 '19 at 07:50

1 Answers1

3

Create index by channel by DataFrame.set_index, then use DataFrameGroupBy.idxmax and last use apply+join:

df1 = (df.set_index('channel')
         .groupby('cust_no')['month1','month2']
         .idxmax()
         .apply('>'.join, axis=1)
         .reset_index(name='new'))
print (df1)
   cust_no       new
0        1  radio>fb
1        2     fb>tv

If no another columns is posible remove filtering columns month1 and month2:

df1 = (df.set_index('channel')
         .groupby('cust_no')
         .idxmax()
         .apply('>'.join, axis=1)
         .reset_index(name='new'))
print (df1)
   cust_no       new
0        1  radio>fb
1        2     fb>tv
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252