I have a dataframe(df) with different columns. One of the column (col1) is as follows:
col1
----
0 1
1 2
2 1-2
3 1,2
4 1-3
5 3
I am using .replace method in python/pandas to replace the codes in col1 using the code:
df.col1.replace(to_replace=({'1':'Normal','2':'1-2 more than normal','3':'3-4 more than normal'}), regex=True)
I am using regex=True
because there are codes like 1-2 in cells where 1 and 2 have different meanings as mentioned in the dictionary.
Output
col1
--------
0 Normal
1 1-2 more than normal
2 Normal-1-2 more than normal
3 Normal,1-2 more than normal
4 Normal-1-2 more than normal-3 more than normal
5 1-2 more than normal-3 more than normal
Desired Output
col1
--------
0 Normal
1 1-2 more than normal
2 Normal-1-2 more than normal
3 Normal,1-2 more than normal
4 Normal-3-4 more than normal
5 3-4 more than normal
The Problem:
If I do not consider the fourth row (1-3) then all the codes are replaced correctly, except for the code 3. I further experimented with adding a row with only code 3 and there I found that regex first replaces the values for code 3 and then in those values replaces the codes with values from the dictionary.
It is strange as I am running the regex code/command only once.
One solution is that instead of using numbers in the dictionary values i could use English words, e.g. instead of writing 1-2 more than normal
, i can write one-two more than normal
and then it works. But I want to keep the numbers as they are easy to interpret.
Any suggestions?