Pandas iterrows very slow

Question

I need some help with replacing iterrows in iterating over a Pandas dataframe. I have a Pandas dataframe like this one:

| cust_no | channel  | month1 | month2 |
|   1     | radio    | 0.7    | 0.4    |
|   1     | fb       | 0.1    | 0.5    |
|   1     | tv       | 0.2    | 0.1    |
|   2     | fb       | 0.5    | 0.25   |
|   2     | radio    | 0.4    | 0.25   |
|   2     | tv       | 0.1    | 0.5    |

I need the channel that has the maximum value for each month when grouped by cust_no, and join them as a string in a new column in the same dataframe. So, for example, from above dataframe:

In Customer 1's case radio has max value in month1, but fb has the max value in month 2, so I need this string: radio>fb

In customer 2's case fb has max value in month1, but tv has the max value in month2, so I need this strong: fb>tv

Appreciate any help. Thanks. Performance is really important

[OBLIGATORY WARNING](https://stackoverflow.com/a/55557758/4909087), don't use `iterrows` to iterate over a dataframe. — cs95, Apr 12 '19 at 07:50

jezrael · Accepted Answer · 2019-04-12T07:54:32.033

Create index by channel by DataFrame.set_index, then use DataFrameGroupBy.idxmax and last use apply+join:

df1 = (df.set_index('channel')
         .groupby('cust_no')['month1','month2']
         .idxmax()
         .apply('>'.join, axis=1)
         .reset_index(name='new'))
print (df1)
   cust_no       new
0        1  radio>fb
1        2     fb>tv

If no another columns is posible remove filtering columns month1 and month2:

df1 = (df.set_index('channel')
         .groupby('cust_no')
         .idxmax()
         .apply('>'.join, axis=1)
         .reset_index(name='new'))
print (df1)
   cust_no       new
0        1  radio>fb
1        2     fb>tv

Thanks! this is great... it didn't occur to me to use channel as index. — Ahsan, Apr 12 '19 at 09:00

Pandas iterrows very slow

1 Answers1