0

I have two files(some rows could be same and some could be different) which have data like this-

PID,          STARTED,%CPU,%MEM,COMMAND
1,Wed Sep 12 10:10:21 2018, 0.0, 0.0,init
2,Wed Sep 12 10:10:21 2018, 0.0, 0.0,kthreadd

Now, I want to perform following operations on these dataframes-

  1. select one row(say R1) from df1
  2. iterate all the rows from df2 and check for mathces with R1.
  3. if it found matches then store it in a seperate dataframe, if it doesn't match ignore it.

Since file has 10000 rows. so I am implementing it via python pandas but not getting the proper way. Any help would be appreciable.

  • Possible duplicate of [How to implement 'in' and 'not in' for Pandas dataframe](https://stackoverflow.com/questions/19960077/how-to-implement-in-and-not-in-for-pandas-dataframe) – anky Mar 11 '19 at 11:40
  • Can you confirm that both dataframes have the same columns? – muzzyq Mar 11 '19 at 11:47
  • @muzzyq Thanks for your input. yes, it has same columns in both dataframe. – manoj kumar Mar 11 '19 at 11:50

1 Answers1

2

Raw data

First dataframe:

df = pd.DataFrame({
    'Started': [*np.repeat(pd.Timestamp(2018, 9, 12, 12, 12, 21), 2)],
    '%CPI': [0.0, 0.0],
    '%MEM': [0.0, 0.0],
    'COMMAND': ['init', 'kthreadd']
})

Output:

    Started %CPI    %MEM    COMMAND
0   2018-09-12 12:12:21 0.0 0.0 init
1   2018-09-12 12:12:21 0.0 0.0 kthreadd

Second dataframe:

df2 = pd.DataFrame({
    'Started': [pd.Timestamp(2018, 9, 12, 12, 12, 21), pd.Timestamp(2020, 9, 12, 12, 12, 21)],
    '%CPI': [0.0, 1.0],
    '%MEM': [0.0, 1.0],
    'COMMAND': ['init', 'different']
})

Output (row 0 the same, row 1 different):

    Started %CPI    %MEM    COMMAND
0   2018-09-12 12:12:21 0.0 0.0 init
1   2020-09-12 12:12:21 1.0 1.0 different

Answer

Create new dataframe with only matching rows:

columns = df.columns.tolist()

matches = pd.merge(df, df2, left_on=columns, right_on=columns)

Output:

    Started %CPI    %MEM    COMMAND
0   2018-09-12 12:12:21 0.0 0.0 init
muzzyq
  • 904
  • 6
  • 14
  • muzzyq, is it internally working as explained in step-1 and step-2? – manoj kumar Mar 11 '19 at 12:18
  • No, in that it’s not an iteration. However, this technique (an inner join) will give you the result you want a lot faster: every row in df1 that matches a row in df2. – muzzyq Mar 11 '19 at 12:22