Pandas - Faster way to find indices in dataframe

Asked Jul 24 '19 at 06:21

Active Jul 24 '19 at 06:44

Viewed 49 times

I've been trying to flag values in a feature if they are in a list. The way I am doing is very inefficient.

I'm iterating all the values inside the list
Finding their indices
Set 1 to the corresponding index in the flag feature.

It takes about 14 minutes for a 1 million row dataframe to finish this loop. This is my code.

df_train['flag'] = 0

for value in big_list:
    for df in [df_train, df_test]:
        idx = np.where(df['feature'] == value)
        df.loc[idx[0], 'flag'] = 1

CPU times: user 14min 48s, sys: 3.46 s, total: 14min 51s
Wall time: 14min 52s

Is there any way to achieve this with set operations and in operator in O(1) time or any slighly faster solution?

edited Jul 24 '19 at 06:39

asked Jul 24 '19 at 06:21

gunesevitan

do you need `df['feature'].isin(big_list).astype(int)`?? – anky Jul 24 '19 at 06:28
Yes that worked. If you write it as an answer, I'll accept it. – gunesevitan Jul 24 '19 at 06:38
ahh, glad it worked. However this question has been asked before so I will close this :) – anky Jul 24 '19 at 06:39

Pandas - Faster way to find indices in dataframe

0 Answers0