How to compare two dataframes and create a new one for those entries which are the same across two columns in the same row

Question

I have been trying to make a comparison of two dataframes, creating new dataframes for the ones which have the same entries in two columns. I thought I had cracked it but the code I have now just looks at the two columns of interest and if the string is found anywhere in that column it considers it a match. I need the two strings to be common on the same row across the columns. A sample of the code follows.

#produce table with common items

    vto_in_jeff = df_vto[(df_vto['source'].isin(df_jeff['source']) & df_vto['target'].isin(df_jeff['target']))].dropna().reset_index(drop=True)
    #vto_in_jeff.index = vto_in_jeff.index + 1  
    vto_in_jeff['compare'] = 'Common_terms'
    print(vto_in_jeff)
    vto_in_jeff.to_csv(output_path+'vto_in_'+f+'.csv', index=False)

So this code comes out with a table which has a list of the rows which has both source and target strings, but not the source and target strings necessarily having to appear in the same row. Can anyone help me look specifically row by row?

Hi, can you please provide a Minimal, Complete, and Verifiable example (https://stackoverflow.com/help/mcve)? — Qaswed, May 07 '19 at 12:21
If I have a table (I don't know how to insert a table) with Columns - Source and Target. Where both the Source and Target are the same I want to create another dataframe with only these entries. One dataframe also has a taxonomic rank column and the other has frequency data so I cannot just look for duplicates. — Sandra Young, May 07 '19 at 12:37
You might use something like `df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])` (an example I copied from here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — Qaswed, May 07 '19 at 13:11

Petronella · Accepted Answer · 2019-05-07T12:45:18.193

1

you can use the pandas merge method

result = pd.merge(df1, df2, on='key')

here are more details: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#brief-primer-on-merge-methods-relational-algebra

edited May 07 '19 at 12:45

answered May 07 '19 at 12:25

Petronella

2,327
1
15
24

but merge will not give me a table of only the items which are identical in the source and target columns will it? – Sandra Young May 07 '19 at 12:41
It gives the intersection of the 2 dataframes on the column specified where the values are identical. Please read also the documentation, I've updated the link to sent you to the specific section. – Petronella May 07 '19 at 12:43

How to compare two dataframes and create a new one for those entries which are the same across two columns in the same row

1 Answers1