0

I have two datasets ("regular_season" and "championships") and I'm trying to see how the players in the championship round performed during the regular season. To do this, I am trying to isolate the subset of those championship round players by looking at their "id" in the regular season dataset (in the regular season dataset, it is termed "user_id," but the actual id numbers are the same) by doing the following:

champ_in_rs <- setdiff(regular_season$user_id, championships$id)
champ_in_rs <- setdiff(regular_season$user_id, champ_in_rs)
id_champ_in_rs <- numeric(length(champ_in_rs))
champ_in_rs <- subset(regular_season, user_id %in% id_champ_in_rs )

This crashed my R a few times and then it returned an empty dataset. Could someone show me how to better write this code?

EDIT: The dataset I'm working with is much larger but as an example, if the regular_season dataset has values

1,4,6,10,15,35

and the championship dataset has values

6, 35

I'm hoping to try to figure out a way to take a subset of the regular_season dataset that includes only 6 and 35.

Aaron
  • 181
  • 5
  • The second line of code is not clear. `champ_in_rs <- setdiff(regular_season$user_id, champ_in_rs)` why do you need that. Third line of code is that a initiation of numeric 0s and it doesn't do anything with `%in%` – akrun May 10 '21 at 02:09
  • @akrun Thanks for your comment, in the second line, I was trying to find the complement of (the first line's) "champ_in_rs" because it seems like that would have been the set of "championship$id" elements in "regular_season" – Aaron May 10 '21 at 02:12
  • Can you tell me the final goal. in that code you are seeking – akrun May 10 '21 at 02:15
  • @akrun Thanks for your answer! I'm trying to isolate the set of championship id's in the larger set of regular season id's – Aaron May 10 '21 at 02:17

1 Answers1

1

The champ_in_rs from the first line of code gives the set of 'user_id' from 'regular_season' that are not present in the 'id' column from 'championships'. The third line

id_champ_in_rs <- numeric(length(champ_in_rs))

initiates a numeric vector of 0s.

subset(regular_season, user_id %in% id_champ_in_rs )

wouldn't match anything as we are comparing a character vector with numeric 0 vector


Based on the example, we just need

subset(regular_season, user_id %in% championships$id)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you again for your answer, it is very helpful! If I were to change the third line to a numeric vector with the championship id's, do you know how I might go about doing that? Also, if I did that, would the "subset" function match the numeric vector then? – Aaron May 10 '21 at 02:18
  • @Aaron the `id_champ_in_rs` should have some values. Can you update your post with a small example and your expected output so that it becomes clear – akrun May 10 '21 at 02:22
  • of course, I've just done that now - thank you again for all your help! – Aaron May 10 '21 at 02:27
  • @Aaron can you try the update – akrun May 10 '21 at 02:29
  • 1
    Thank you, it worked perfectly! And of course, I'll accept your answer as soon as it will let me! – Aaron May 10 '21 at 02:31