I have two data frames having the same set of columns, and I want to do a left join to identify the rows in the first dataframe which are not present in the second one.
mylog = {
'date': ['2020-12-05', '2020-12-05', '2020-12-05', '2020-12-06', '2020-12-06', '2020-12-06', '2020-12-06', '2020-12-07', '2020-12-07', '2020-12-07'],
'rider_id': ['10554','10372','10372','19042','12544','10554','10996','10994','15498', '10554'],
'issue_id': [1,1,3,2,5,1,8,12,1,3]
}
dblog = {
'date': ['2020-12-05', '2020-12-05', '2020-12-05', '2020-12-06', '2020-12-06', '2020-12-06', '2020-12-06', '2020-12-07'],
'rider_id': ['10554','10372','10372','19042','12544','10554','10996','10994'],
'issue_id': [1,3,3,2,5,1,8,12]
}
All the three column entries together form an unique entry.
mylog_df = pd.DataFrame.from_dict(mylog)
dblog_df = pd.DataFrame.from_dict(dblog)
merge_df = mylog_df.merge(dblog_df, how = 'left', left_on = ['date', 'rider_id', 'issue_id'], right_on = ['date','rider_id', 'issue_id'])
merge_df
This returns all the ten rows of the mylog_df and just the three columns. I want the left join to return a dataframe from which I can segregate those entries that are not present in the dblog_df.