I have 2 data frames, one called athletes.df and one called medals.df. Both have a column named athlete_id which is a unique key. The problem I have is Some rows appear on the medals.df table but not in the athletes.df, in which case I need to remove them from medals.df.
Example of the data:
athletes.df
athlete_id V1 V2
'ttt' 5 6
'45d' 4 5
'tjd 4 5
medals.df
athlete_id V3 V4
'ttt' 2 4
'45d' 5 5
'tjd 4 5
'err' 6 7
If you look at the last row in medals.df it has an athlete_id of 'err' that does not appear in athletes.df,in this case I would like to remove the entire row.Basicaly I am looking to remove rows from medals.df when thier athlete_id cannot be found in ateletes.df table. I know this can be done with a loop but the real data is about 30000 rows for each data set and this can take a very long time, is their a way I can get this done in an efficient way?