How to vectorize a function for each ID# in pandas

Question

This method is time/cpu intensive and there's got to be a better way! Can someone help me vectorize the following code without using a loop? Basically, I have a df where each subject has multiple rows, and each row has a value. I want to add a column that displays the highest value for every subject (will be the same for every row of the subject).

import pandas as pd
import numpy as np
from numpy import nan

compare_table = pd.DataFrame({
    'id': [1,1,1,2,2,3,3,3],
    'day#': [1, 2, 3, 1, 2, 1, 2, 3],
    'random#': [2,5,1,6, 4, 5, 9, 3],
     'highest_random#': [nan, nan, nan, nan, nan, nan, nan, nan]}, columns=[
    'id', 'day#','random#','highest_random#'])

for element in list(compare_table['id'].unique()):
        highest_random = max(compare_table.loc[compare_table.loc[:,'id']==element, 'random#'])
        compare_table.loc[compare_table.loc[:,'id']==element, 'highest_random#']= highest_random

`compare_table.groupby('id')['random#'].transform('max') ` try to search before ask .. — BENY, Apr 17 '18 at 14:45

score 2 · Accepted Answer · answered Apr 17 '18 at 14:45

2

Use GroupBy.transform by max or map aggregated Series:

compare_table['highest_random#1'] = compare_table.groupby('id')['random#'].transform('max')

#a bit slowier alternative
s = compare_table.groupby('id')['random#'].max()
compare_table['highest_random#2'] = compare_table['id'].map(s)
print (compare_table)
   id  day#  random#  highest_random#  highest_random#1  highest_random#2
0   1     1        2              5.0                 5                 5
1   1     2        5              5.0                 5                 5
2   1     3        1              5.0                 5                 5
3   2     1        6              6.0                 6                 6
4   2     2        4              6.0                 6                 6
5   3     1        5              9.0                 9                 9
6   3     2        9              9.0                 9                 9
7   3     3        3              9.0                 9                 9

answered Apr 17 '18 at 14:45

jezrael

822,522
95
1,334
1,252

awesome thanks man, I appreciate it! I was searching for this for a while but it can be hard to find info when unsure of how to ask the question in a general way. – Garrett Eickelberg Apr 17 '18 at 14:50
@GarrettEickelberg - I am curious about timings, so give me some time for add it. – jezrael Apr 17 '18 at 14:51
went from 5min to 5sec processing time, thanks a lot! – Garrett Eickelberg Apr 17 '18 at 15:08
@GarrettEickelberg - Super, glad can help! – jezrael Apr 17 '18 at 15:09

How to vectorize a function for each ID# in pandas

1 Answers1