R - removing rows from data frame according to a column in another data frame

Question

I have 2 data frames, one called athletes.df and one called medals.df. Both have a column named athlete_id which is a unique key. The problem I have is Some rows appear on the medals.df table but not in the athletes.df, in which case I need to remove them from medals.df.

Example of the data:

athletes.df
    athlete_id   V1  V2
    'ttt'        5    6
    '45d'        4    5
    'tjd         4    5

medals.df   
    athlete_id   V3  V4
    'ttt'        2    4
    '45d'        5    5
    'tjd         4    5
    'err'        6    7

If you look at the last row in medals.df it has an athlete_id of 'err' that does not appear in athletes.df,in this case I would like to remove the entire row.Basicaly I am looking to remove rows from medals.df when thier athlete_id cannot be found in ateletes.df table. I know this can be done with a loop but the real data is about 30000 rows for each data set and this can take a very long time, is their a way I can get this done in an efficient way?

Also: http://stackoverflow.com/questions/33070523/how-to-subset-a-data-frame-based-on-another-data-frame-in-base-r — dayne, Jul 25 '16 at 18:14

score 0 · Answer 1 · answered Jul 25 '16 at 21:01

0

This is the instruction you're looking for:

athletes.df <- athletes.df[athletes.df$athlete_id %in% medals.df$athlete_id, ]

answered Jul 25 '16 at 21:01

tia_0

412
1
3
11

R - removing rows from data frame according to a column in another data frame

1 Answers1