1

I have two dataframes:

DF1:

ID     v1           v2         v3
289  1455.0        2.0        0.62239  
289  1460.0        0.0        0.46037  
289  1465.0        4.0        0.41280 
290  1470.0        0.0        0.39540 
290  1475.0        2.0        0.61809 
290  1475.0        2.0        0.61809

DF2:

ID     v1           v2         v3
289  1423.0        2.0        0.62239  
289  142Q.0        0.0        0.46037  
289  14FW.0        4.0        0.41280  
290  14Q3.0        0.0        0.39540  
290  1453.0        2.0        0.61809 
290  1454.0        2.0        0.61809

I want to iterate each row in DF1 with every row in DF2 and see if it is in DF2, something like:

for row in results_01.iterrows():
    diff = []
    if row not in results_02:
        add different one to 'diff'
        print(diff)

I know the logic but not sure how to do this, new to Python, can anyone help me? Many thanks.

Cecilia
  • 309
  • 2
  • 12
  • What should the output be from the given inputs? There are no rows that match completely in the given sample – G. Anderson Oct 21 '19 at 20:47
  • I answered below, but curios why you call the similar rows ‘diff’? – Lior Cohen Oct 21 '19 at 20:49
  • @LiorCohen My bad, I wanted to say if 'row not in results_02', then add to diff, basically I want to find the different rows and not line by line. – Cecilia Oct 21 '19 at 20:55
  • and to answer the first question, this sample data is just an example, I copied somewhere else. @G.Anderson – Cecilia Oct 21 '19 at 20:56
  • You may want to include some rows that do match, otherwise all of the rows you provided would be different so the simplest code would just be `diff=df1.copy()` – G. Anderson Oct 21 '19 at 21:02

3 Answers3

1

The code block you have looks pretty close to what you do in python. Take a row from one dataframe and iterate through the other dataframe looking for matches.

for index, row in results_01.iterrows():
    diff = []
    compare_item = row['col_name'] 
    for index, row in results_02.iterrows():
       if compare_item == row['compare_col_name']:
           diff.append(compare_item, row['col_name']
    return diff 

Here I am taking a specific column value from a row from one dataframe and comparing it to another value from the other dataframe

1

You can do it easily with ‘inner’ merge.

intersect = pd.merge(df1, df2, how='inner')

Edit:

It turns out that the rows that are in df1 and not in df2 are wanted and not the intersection. In this case one should use the isin pandas method. Here is SO link that deals with it.

Lior Cohen
  • 5,570
  • 2
  • 14
  • 30
0

One way to do it (maybe not the most efficient) would be to append the dataframes together and then drop duplicates, like so:

full_df = df1.append(df2)
full_df = full_df.drop_duplicates(keep=False)
manny
  • 338
  • 1
  • 11
  • why full_df returns nothing? I tried 'results_01.equals(results_03)' and it gives False which means two files are different – Cecilia Oct 21 '19 at 21:22
  • Hi I tried two different files and it gives me 4 rows, I compared manually but found them are the same (pair to pair), any chance you know what's going on? – Cecilia Oct 22 '19 at 15:59
  • I don't understand what you mean... Didn't you want that? – manny Oct 23 '19 at 16:12