2

I have a similar problem to this question similar question. However, I need to replace values in the same column given different conditions. Something like the code below

for item in items:
    df.loc[df['A'] == item,'A'] = 'other'

where items is a list with different strings that I need to replace with 'other' in column 'A'. The thing is that my dataframe is very large and this approach is very slow. Is there a faster way to do it?

jpp
  • 159,742
  • 34
  • 281
  • 339
user1571823
  • 394
  • 5
  • 20

1 Answers1

1

Use pd.Series.isin to index by a single Boolean series:

df.loc[df['A'].isin(items), 'A'] = 'other'

The bottleneck in your logic is df['A'] == item in a loop. The above method ensures only a single Boolean series is calculated for indexing.

jpp
  • 159,742
  • 34
  • 281
  • 339