How to compare one row from df1 from other rows from df2 based on some condition in pandas?

Question

I have two files(some rows could be same and some could be different) which have data like this-

PID,          STARTED,%CPU,%MEM,COMMAND
1,Wed Sep 12 10:10:21 2018, 0.0, 0.0,init
2,Wed Sep 12 10:10:21 2018, 0.0, 0.0,kthreadd

Now, I want to perform following operations on these dataframes-

select one row(say R1) from df1
iterate all the rows from df2 and check for mathces with R1.
if it found matches then store it in a seperate dataframe, if it doesn't match ignore it.

Since file has 10000 rows. so I am implementing it via python pandas but not getting the proper way. Any help would be appreciable.

Possible duplicate of [How to implement 'in' and 'not in' for Pandas dataframe](https://stackoverflow.com/questions/19960077/how-to-implement-in-and-not-in-for-pandas-dataframe) — anky, Mar 11 '19 at 11:40
@muzzyq Thanks for your input. yes, it has same columns in both dataframe. — manoj kumar, Mar 11 '19 at 11:50

muzzyq · Answer 1 · 2019-03-11T12:27:51.593

Raw data

First dataframe:

df = pd.DataFrame({
    'Started': [*np.repeat(pd.Timestamp(2018, 9, 12, 12, 12, 21), 2)],
    '%CPI': [0.0, 0.0],
    '%MEM': [0.0, 0.0],
    'COMMAND': ['init', 'kthreadd']
})

Output:

    Started %CPI    %MEM    COMMAND
0   2018-09-12 12:12:21 0.0 0.0 init
1   2018-09-12 12:12:21 0.0 0.0 kthreadd

Second dataframe:

df2 = pd.DataFrame({
    'Started': [pd.Timestamp(2018, 9, 12, 12, 12, 21), pd.Timestamp(2020, 9, 12, 12, 12, 21)],
    '%CPI': [0.0, 1.0],
    '%MEM': [0.0, 1.0],
    'COMMAND': ['init', 'different']
})

Output (row 0 the same, row 1 different):

    Started %CPI    %MEM    COMMAND
0   2018-09-12 12:12:21 0.0 0.0 init
1   2020-09-12 12:12:21 1.0 1.0 different

Answer

Create new dataframe with only matching rows:

columns = df.columns.tolist()

matches = pd.merge(df, df2, left_on=columns, right_on=columns)

Output:

    Started %CPI    %MEM    COMMAND
0   2018-09-12 12:12:21 0.0 0.0 init

muzzyq, is it internally working as explained in step-1 and step-2? — manoj kumar, Mar 11 '19 at 12:18
No, in that it’s not an iteration. However, this technique (an inner join) will give you the result you want a lot faster: every row in df1 that matches a row in df2. — muzzyq, Mar 11 '19 at 12:22

How to compare one row from df1 from other rows from df2 based on some condition in pandas?

1 Answers1