0

I have two dataframes: an id dataframe, and an event dataframe, which roughly look as follows: id tables:

id    date
N103  09-01
N434  09-03
N234  08-10

event table:

id    date   Event name
N103  08-01  Failure
N104  12-02  Alarm
N103  12-02  Warning
N234  11-09  Information
N103  12-01  Disabled

I want to get all rows from the event dataframe that are in the id dataframe. So, for my example, the output is

N103  08-01  Failure
N103  12-02  Warning
N234  11-09  Information
N103  12-01  Disabled

The problem is that my id dataframe has 2.500 rows, and my event dataframe has 900.000.000 rows. Hence, I am looking to find a very efficient to do this. What I currently have is:

data = []
for id in id_dataframe['id']:
    part = event[event['id']==id]
    data.append(part)

This, however, seems very inefficient, because every time I compare the "id" to all ids in the event dataframe.

Is there a more efficient to do this, that can be relatively easily implemented?

Schach21
  • 412
  • 4
  • 21

1 Answers1

1

U can do,

result = id_dataframe.merge(event, on='id', how='right')
Vishal Kamlapure
  • 590
  • 4
  • 16
  • how does doing this treat when certain keys appear multiple times? For instance, in the example I mentioned, the key N103 appears 3 times. – Schach21 Oct 07 '20 at 18:17
  • Yes , it will consider every appearances of the each key. There is good explanation on how `.merge()` works [Here](https://stackoverflow.com/questions/53645882/pandas-merging-101). Also Documentaion [Here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) – Vishal Kamlapure Oct 07 '20 at 18:27