I have two dataframes: an id dataframe, and an event dataframe, which roughly look as follows: id tables:
id date
N103 09-01
N434 09-03
N234 08-10
event table:
id date Event name
N103 08-01 Failure
N104 12-02 Alarm
N103 12-02 Warning
N234 11-09 Information
N103 12-01 Disabled
I want to get all rows from the event dataframe that are in the id dataframe. So, for my example, the output is
N103 08-01 Failure
N103 12-02 Warning
N234 11-09 Information
N103 12-01 Disabled
The problem is that my id dataframe has 2.500 rows, and my event dataframe has 900.000.000 rows. Hence, I am looking to find a very efficient to do this. What I currently have is:
data = []
for id in id_dataframe['id']:
part = event[event['id']==id]
data.append(part)
This, however, seems very inefficient, because every time I compare the "id" to all ids in the event dataframe.
Is there a more efficient to do this, that can be relatively easily implemented?