0

I have 2 data frames, one called athletes.df and one called medals.df. Both have a column named athlete_id which is a unique key. The problem I have is Some rows appear on the medals.df table but not in the athletes.df, in which case I need to remove them from medals.df.

Example of the data:

athletes.df
    athlete_id   V1  V2
    'ttt'        5    6
    '45d'        4    5
    'tjd         4    5

medals.df   
    athlete_id   V3  V4
    'ttt'        2    4
    '45d'        5    5
    'tjd         4    5
    'err'        6    7

If you look at the last row in medals.df it has an athlete_id of 'err' that does not appear in athletes.df,in this case I would like to remove the entire row.Basicaly I am looking to remove rows from medals.df when thier athlete_id cannot be found in ateletes.df table. I know this can be done with a loop but the real data is about 30000 rows for each data set and this can take a very long time, is their a way I can get this done in an efficient way?

Spacedman
  • 92,590
  • 12
  • 140
  • 224
Lee
  • 129
  • 2
  • 7
  • 1
    Also: http://stackoverflow.com/questions/33070523/how-to-subset-a-data-frame-based-on-another-data-frame-in-base-r – dayne Jul 25 '16 at 18:14

1 Answers1

0

This is the instruction you're looking for:

athletes.df <- athletes.df[athletes.df$athlete_id %in% medals.df$athlete_id, ]
tia_0
  • 412
  • 1
  • 3
  • 11