1

I have a problem normalizing data in pandas.

In [37]:
import pandas as pd # data processing
from IPython.display import display

This is my dataset I have ...

In [37]:
d = {'FTR': ['W', 'D', 'L', 'W'], 'HTG': [3, 0, 1, 2], 'ATG': [0, 0, 2, 0], 'HTN': ['Alpha', 'Alpha', 'Alpha', 'Beta'], 'ATN': ['Beta', 'Chi', 'Epsilon', 'Alpha']}
df = pd.DataFrame(data=d)
display(df)

    FTR HTG ATG HTN     ATN
0   W   3   0   Alpha   Beta
1   D   0   0   Alpha   Chi
2   L   1   2   Alpha   Epsilon
3   W   2   0   Beta    Alpha

... and so I would like the data to have

d = {'FTR': ['W', 'D', 'L', 'W'], 'HTG': [3, 0, 1, 2], 'ATG': [0, 0, 2, 0], 'HTN': [1, 1, 1, 2], 'ATN': [2, 22, 5,1]}
df = pd.DataFrame(data=d)
display(df)

    FTR  HTG  ATG  HTN  ATN
0   W    3    0    1    2
1   D    0    0    1   22
2   L    1    2    1    5
3   W    2    0    2    1

Any idea?

creep3007
  • 1,794
  • 2
  • 21
  • 22
  • Do you have a preset map of names to numbers? i.e. `{'alpha':1, 'beta':2, ...}` – Dillon Jun 22 '18 at 14:52
  • That isn't normalizing, thats encoding. You can use `map` or `replace` functions if you have a dictionary with matching values . Or you can use `pd.factorize()` – Bharath M Shetty Jun 22 '18 at 14:54
  • @Dark, The dup doesn't answer how to ensure consistent factorization *across series*. Can you find a more suitable dup? If not, I think it's worth a demonstration. – jpp Jun 22 '18 at 15:06
  • 1
    @jpp I added that too to the duplicates. Yeap ensuring encoding across multiple columns is necessary for the OP I guess. – Bharath M Shetty Jun 22 '18 at 15:10
  • @creep3007, Not all the dup solutions preserve ordering (first value met has code 0). I've added a solution to the second duplicate which accounts for this. – jpp Jun 22 '18 at 15:18

0 Answers0