Alternate way to do it yourself:
You can create a mapping from the unique values of the column you are after and apply it to its series and store as new column:
import pandas as pd
from random import choice
data = ["One", "Two", "Fourty-Two oder More", "Not Categorized"]
# random demo data
df = pd.DataFrame({ "DataPoint": [f"Patient_{i:03}" for i in range(30)],
"Category": [choice(data) for _ in range(30)]})
# create an automatic mapper dict from the unique values of the column
# you can finetune it by providing a fixed own wrapper if you like
mapper = {k: idx for idx, k in enumerate(df.Category.unique())}
#apply mapper and save as new data
df["mapped"] = df["Category"].apply(mapper.get)
print(df)
Output:
DataPoint Category mapped
0 Patient_000 One 0
1 Patient_001 Not Categorized 1
2 Patient_002 Not Categorized 1
3 Patient_003 Two 2
4 Patient_004 Fourty-Two oder More 3
.. ... ... ...
26 Patient_026 One 0
27 Patient_027 Not Categorized 1
28 Patient_028 Two 2
29 Patient_029 One 0
Let pandas do it for you:
You can declare your column categorical (answer attributation) and let pandas do the rest:
df = pd.DataFrame({ "DataPoint": [f"Patient_{i:03}" for i in range(30)],
"Category": [choice(data) for _ in range(30)]})
df.Category = pd.Categorical(df.Category)
df["NumericalCat"] = df.Category.cat.codes
print(df)
Output:
DataPoint Category NumericalCat
0 Patient_000 One 2
1 Patient_001 Fourty-Two oder More 0
2 Patient_002 Not Categorized 1
3 Patient_003 Two 3
4 Patient_004 One 2
.. ... ... ...
26 Patient_026 Fourty-Two oder More 0
27 Patient_027 Two 3
28 Patient_028 Two 3
29 Patient_029 Fourty-Two oder More 0