I've been trying to flag values in a feature if they are in a list. The way I am doing is very inefficient.
- I'm iterating all the values inside the list
- Finding their indices
- Set 1 to the corresponding index in the
flag
feature.
It takes about 14 minutes for a 1 million row dataframe to finish this loop. This is my code.
df_train['flag'] = 0
for value in big_list:
for df in [df_train, df_test]:
idx = np.where(df['feature'] == value)
df.loc[idx[0], 'flag'] = 1
CPU times: user 14min 48s, sys: 3.46 s, total: 14min 51s
Wall time: 14min 52s
Is there any way to achieve this with set operations and in
operator in O(1) time or any slighly faster solution?