1

I have a dataframe which is

| A  | B |
| -- | -- |
| 1  | good |
| 2  | bad  |
| 3  | bad  |
| 4  | mid  |

and I want to add another column that transforms the B column into numbers, so that if B is "good" you get a 1, if B is "bad" you get a 0 and if B is "mid" you get another number, say 2.

What is the most effective way to do this, given that the dataframe is quite large? I could use a lambda function and .apply, this is one way, not particularly efficient. Using filtering you could do things like

df['C'] = np.where(df['B']=='good', 1, 0)

which would allow me to use a binary condition and is very fast. But to make it do both conditions I'd have to use two of those statements. I wonder if there is a way to use an efficient filtering with a double condition.

mar tin
  • 9,266
  • 23
  • 72
  • 97

1 Answers1

1

Use a dictionary. For example define:

d = {'good': 1, 'bad': 0, 'mid': 2}

Then use pd.Series.map, which accepts a dictionary as an input:

df['C'] = df['B'].map(d)

You can also use pd.Series.replace, but this method often yields poor performance.

Related: Replace values in a pandas series via dictionary efficiently

jpp
  • 159,742
  • 34
  • 281
  • 339