I have 2 data frames & I am using iterrows to check to find the items that are present in both the dataframes. The codes is as below.
import pandas as pd
Df1 = pd.DataFrame({'name': ['Marc', 'Jake', 'Sam', 'Brad'],
'Age': ['24', '25', '35', '27'],
'City': ['Agra', 'Bangalore', 'Calcutta', 'Delhi']})
Df2 = pd.DataFrame({'name': ['Jake', 'John', 'Marc', 'Tony', 'Bob', 'Marc'],
'Age': ['25', '25', '24', '28','29', '39'],
'City': ['Bangalore', 'Chennai', 'Agra', 'Delhi','Pune','zoo']})
age1=[]
age2=[]
for index, row in Df1.iterrows():
if Df2.name.isin([row['name']]).any():
print(Df2.loc[Df2['name']==row['name'],'Age'].values)
print(Df1.loc[Df2['name']==row['name'],'Age'].values)
print(Df1.loc[Df2['name']==row['name']])
The code works for value Marc, this value is present in both data frames, so it gets printed out. However, this code also prints Sam (Sam is only present in Df2) instead of Jake which is present in both Df1 & Df2.
The out put is something like this
['24' '39']
['35']
name Age City
2 Sam 35 Calcutta
['25']
['24']
name Age City
0 Marc 24 Agra
Why is it giving the out put like this? IT does not make any sense. Marc's age in Df2 is printed (which is correct), then Sam's age in DF1 is printed! Then row where Sam is present in Df1. Then, I don't know how to make sense of the rest.
Also, why is Marc being printed 2nd? i assumed that since Marc is the first value in Df1, that should be checked first & printed & then Jake.