how to iterate each row of one dataframe and compare with rows in another dataframe in Python?

Question

I have two dataframes:

DF1:

ID     v1           v2         v3
289  1455.0        2.0        0.62239  
289  1460.0        0.0        0.46037  
289  1465.0        4.0        0.41280 
290  1470.0        0.0        0.39540 
290  1475.0        2.0        0.61809 
290  1475.0        2.0        0.61809

DF2:

ID     v1           v2         v3
289  1423.0        2.0        0.62239  
289  142Q.0        0.0        0.46037  
289  14FW.0        4.0        0.41280  
290  14Q3.0        0.0        0.39540  
290  1453.0        2.0        0.61809 
290  1454.0        2.0        0.61809

I want to iterate each row in DF1 with every row in DF2 and see if it is in DF2, something like:

for row in results_01.iterrows():
    diff = []
    if row not in results_02:
        add different one to 'diff'
        print(diff)

I know the logic but not sure how to do this, new to Python, can anyone help me? Many thanks.

What should the output be from the given inputs? There are no rows that match completely in the given sample — G. Anderson, Oct 21 '19 at 20:47
I answered below, but curios why you call the similar rows ‘diff’? — Lior Cohen, Oct 21 '19 at 20:49
@LiorCohen My bad, I wanted to say if 'row not in results_02', then add to diff, basically I want to find the different rows and not line by line. — Cecilia, Oct 21 '19 at 20:55
and to answer the first question, this sample data is just an example, I copied somewhere else. @G.Anderson — Cecilia, Oct 21 '19 at 20:56
You may want to include some rows that do match, otherwise all of the rows you provided would be different so the simplest code would just be `diff=df1.copy()` — G. Anderson, Oct 21 '19 at 21:02

William Knighting · Answer 1 · 2019-10-22T12:50:49.757

1

The code block you have looks pretty close to what you do in python. Take a row from one dataframe and iterate through the other dataframe looking for matches.

for index, row in results_01.iterrows():
    diff = []
    compare_item = row['col_name'] 
    for index, row in results_02.iterrows():
       if compare_item == row['compare_col_name']:
           diff.append(compare_item, row['col_name']
    return diff

Here I am taking a specific column value from a row from one dataframe and comparing it to another value from the other dataframe

edited Oct 22 '19 at 12:50

answered Oct 21 '19 at 20:44

William Knighting

113
1
9

Hi I want to compare the whole row instead of one element because I only care about the difference in the row with the same ID :) – Cecilia Oct 21 '19 at 21:02
and it gives me error 'IndentationError: unexpected indent' – Cecilia Oct 21 '19 at 21:05
This should be fixed for indentation. Hope it helps! – William Knighting Oct 24 '19 at 16:32

Lior Cohen · Answer 2 · 2019-10-21T21:20:32.893

1

You can do it easily with ‘inner’ merge.

intersect = pd.merge(df1, df2, how='inner')

Edit:

It turns out that the rows that are in df1 and not in df2 are wanted and not the intersection. In this case one should use the isin pandas method. Here is SO link that deals with it.

edited Oct 21 '19 at 21:20

answered Oct 21 '19 at 20:44

Lior Cohen

5,570
2
14
30

I got error 'SyntaxError: invalid character in identifier' – Cecilia Oct 21 '19 at 20:57

score 0 · Answer 3 · answered Oct 21 '19 at 20:42

0

One way to do it (maybe not the most efficient) would be to append the dataframes together and then drop duplicates, like so:

full_df = df1.append(df2)
full_df = full_df.drop_duplicates(keep=False)

answered Oct 21 '19 at 20:42

manny

338
1
11

why full_df returns nothing? I tried 'results_01.equals(results_03)' and it gives False which means two files are different – Cecilia Oct 21 '19 at 21:22
Hi I tried two different files and it gives me 4 rows, I compared manually but found them are the same (pair to pair), any chance you know what's going on? – Cecilia Oct 22 '19 at 15:59
I don't understand what you mean... Didn't you want that? – manny Oct 23 '19 at 16:12

how to iterate each row of one dataframe and compare with rows in another dataframe in Python?

3 Answers3