I have two DataFrames with many thousands of rows. The columns all have the dtype string. A snippet looks like this:
DF1:
ID SUCCESSOR
0 0001234 3620031
1 0001235 6640002
2 0002456 8620003
3 0013456 8640004
4 1711999 1283456 <- see DF2
... ... ...
409813 9162467 <NA>
409814 9212466 <NA>
409815 9312466 6975A0C
409816 9452463 <NA>
409817 9591227 <NA>
DF2:
ID
2 1111682
3 1123704
14 1567828
15 1711999 <- that value should be replaced with '1283456'
16 1711834
... ...
845775 970879B
845776 975879B
845777 9275A0A
845778 9285A05
845779 9295A05
Don't be surprised that the second DataFrame is missing a few indexes, because I filtered them before, as they were not relevant. Also, the NaNs are not relevant because my algorithm bypasses them.
I now want to replace the IDs in the second DataFrame with the successors in the first DataFrame that have the same ID.
The output should be:
ID
2 1111682
3 1123704
14 1567828
15 1283456 <- now replaced
16 1711834
... ...
845775 970879B
845776 975879B
845777 9275A0A
845778 9285A05
845779 9295A05
In order not to blow up the example, I have replaced only one value. In reality there are several replacements
Two approaches:
In my first approach I iterated through DF1 and used the replace() function, however this approach takes years, so its useless.
In my second approach, I first convert DF1 to a dictionary and then apply the map() function. I do this as described here by JohnE: Remap values in pandas column with a dict and in a small example it works wonderfully:
df = pd.DataFrame({'col1': {1: 1, 2: 2, 3: 4, 4: 1}, 'col2': {1: 2, 2: np.nan}}) di = {1: "A", 2: "B"} col1 col2 1 1 2.0 2 2 NaN 3 4 NaN 4 1 NaN df['col1'].map(di).fillna(df['col1']) 1 A 2 B 3 4 4 A
My function to map DF1 and DF2 looks like this:
def mapping(df1, df2):
di =dict(zip(df1.ID, df1.SUCCESSOR)) # create the dict
changes = 1
while(changes > 0):
changes = 0
df_old = df2
print(df2) #check how df2 looks before mapping.
df2['ID'] = df2['ID'].map(di).fillna(df2['ID'])
print(df2) # check how df2 looks after mapping. Unfortunately no changes :( so the error must be in the mapping function one line above here.
if df_old.equals(df2) == False:
changes = 1
return df2
So obviously the error must be in this line:
df2['ID'] = df2['ID'].map(dic).fillna(df2['ID']).
However, I just can't figure out why this doesn't work. What is not working here and why?
If anyone can help me, I am indebted to them forever!!!
Best regards, Alfonso
EDIT: EDIT: I found the error and I am an idiot. My solution worked, but the line: "df_old = df2" prevented the loop from continuing. MANY THANKS anyway and sorry if I have robbed time!