1

I've been trying to flag values in a feature if they are in a list. The way I am doing is very inefficient.

  • I'm iterating all the values inside the list
  • Finding their indices
  • Set 1 to the corresponding index in the flag feature.

It takes about 14 minutes for a 1 million row dataframe to finish this loop. This is my code.

df_train['flag'] = 0

for value in big_list:
    for df in [df_train, df_test]:
        idx = np.where(df['feature'] == value)
        df.loc[idx[0], 'flag'] = 1

CPU times: user 14min 48s, sys: 3.46 s, total: 14min 51s
Wall time: 14min 52s

Is there any way to achieve this with set operations and in operator in O(1) time or any slighly faster solution?

gunesevitan
  • 882
  • 10
  • 25

0 Answers0