1

have two dataframes with columns

df1


name    cell     marks  

tom      2       21862


df2


name    cell    marks     passwd

tom      2       11111      2548

matt     2       158416      2483
         2       21862      26846

How to compare df2 with df1 and get nearest matched data frames

expected_output:

df2


name    cell    marks     passwd

tom      2       11111      2548
         2       21862      26846

tried merge but data is dynamic. On one case name might change and in another case marks might change

2 Answers2

0

You can try the following:

import pandas as pd
dict1 = {'name': ['tom'], 'cell': [2], 'marks': [21862]}
dict2 = {'name': ['tom', 'matt'], 'cell': [2, 2],
         'marks': [21862, 158416], 'passwd': [2548, 2483]}

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

compare = df2.isin(df1)
df2 = df2.iloc[df2.where(compare).dropna(how='all').index]
print(df2)

Output:

  name  cell  marks  passwd
0  tom     2  21862    2548
Mateo Lara
  • 827
  • 2
  • 12
  • 29
0

You can use pandas.merge with the option indicator=True, filtering the result for 'both':

import pandas as pd

df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])

df2 = pd.DataFrame([['tom', 2, 11111, 2548],
                    ['matt', 2, 158416, 2483]
                    ], columns=["name", "cell", "marks", "passwd"])


def compare_dataframes(df1, df2):
    """Find rows which are similar between two DataFrames."""
    comparison_df = df1.merge(df2,
                              indicator=True,
                              how='outer')
    return comparison_df[comparison_df['_merge'] == 'both'].drop(columns=["_merge"])


print(compare_dataframes(df1, df2))

Returns:

  name  cell  marks  passwd
0  tom     2  11111    2548
Gustav Rasmussen
  • 3,720
  • 4
  • 23
  • 53