1

I have 2 data frames & I am using iterrows to check to find the items that are present in both the dataframes. The codes is as below.

import pandas as pd

Df1 = pd.DataFrame({'name': ['Marc', 'Jake', 'Sam', 'Brad'],
                'Age': ['24', '25', '35', '27'],
                'City': ['Agra', 'Bangalore', 'Calcutta', 'Delhi']})

Df2 = pd.DataFrame({'name': ['Jake', 'John', 'Marc', 'Tony', 'Bob', 'Marc'],
                'Age': ['25', '25', '24', '28','29', '39'],
                'City': ['Bangalore', 'Chennai', 'Agra', 'Delhi','Pune','zoo']})
age1=[]
age2=[]

for index, row in Df1.iterrows():
    if Df2.name.isin([row['name']]).any():
        print(Df2.loc[Df2['name']==row['name'],'Age'].values)
        print(Df1.loc[Df2['name']==row['name'],'Age'].values)
        print(Df1.loc[Df2['name']==row['name']])

The code works for value Marc, this value is present in both data frames, so it gets printed out. However, this code also prints Sam (Sam is only present in Df2) instead of Jake which is present in both Df1 & Df2.

The out put is something like this

['24' '39']
['35']
name Age      City
2  Sam  35  Calcutta
['25']
['24']
name Age  City
0  Marc  24  Agra

Why is it giving the out put like this? IT does not make any sense. Marc's age in Df2 is printed (which is correct), then Sam's age in DF1 is printed! Then row where Sam is present in Df1. Then, I don't know how to make sense of the rest.

Also, why is Marc being printed 2nd? i assumed that since Marc is the first value in Df1, that should be checked first & printed & then Jake.

moys
  • 7,747
  • 2
  • 11
  • 42
  • 1
    Just use `df1.merge(df2)`. – cs95 May 20 '19 at 05:14
  • 1
    Change your print statement to `print(Df1.loc[index])` – hacker315 May 20 '19 at 05:19
  • @cs95 this works to get all the values that are present in both df's . How do I get the inverse of these? The values that that are unique in each of the df's? – moys May 20 '19 at 05:25
  • 1
    @mohanys Take a look at anti joins... it's in the post I linked you to. – cs95 May 20 '19 at 05:54
  • @cs95 Thank you. I will take a look at the link you have provided. My final goal is to find the differences between 2 data frames as i stated in https://stackoverflow.com/questions/56005995/comparing-data-frames-and-getting-the-differences-with-python Since no direct solution exists, i am doing it the hard way. I have take one row from each column & then try to find is there is a match (hierarchically, if you take the data frames mentioned in this post, first name match, then city, then the AGE if there is a difference in age, how much is the difference...so on) – moys May 20 '19 at 07:49
  • @cs95 can you please remove the 'duplicate' sign on this question? i am trying to hierarchically compare values in multiple columns on 2 different df's. Anti-join just gives the data that is not same in both df's. I want to actually get the data showing the exact difference. If we take the data frames mentioned in this post, first i try to see if the name matches, then city, then the AGE if there is a difference in age, how much is the difference. For me to match, i need to get the city value when the name matches & then when the city also matches, i need the age so that i can do a comparison. – moys May 20 '19 at 11:57

0 Answers0