Need help Subsetting a Data frame in R

Question

I have a large dataset (7000+ rows and 42 columns). I want to subset this dataset using specific IDs that each row has. There are 178 IDs I'd like to use and its possible that the IDs occur more than once. I've tried using "filter" from dplyr but I keep getting this error:

longer object length is not a multiple of shorter object length"

Edit: Sorry I'm still new to most of this, and to this site.

Here is the code I used to try and filter out the specific IDs:

df2 <- filter(df, df$ID==c(2983, 3413,  1266, 3049, 1237,[...],  1002, 1003, 4001)) #the elipsis is there because otherwise code would be too long for this post.

It produces this error:

Warning messages:
1: In `==.default`(df$ID, c(2983, 3413, 1266, 3049, 1237,  :
  longer object length is not a multiple of shorter object length
2: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length

Edit 2: Thanks for all the help, I was able to fix my issue. I appreciate the feedback and hopefully will be able to be more clear if I have questions for this site in the future.

It would be easier to help if you included part of your data using `dput` as part of your question. — Jonathan V. Solórzano, Mar 31 '20 at 00:33
And include actual code not abbreviated paraphrase of what you tried. — Parfait, Mar 31 '20 at 00:37
you should use `%in%`, `filter(df, ID %in% c(2983, 3413, 1266, 3049, 1237....))` — Ronak Shah, Mar 31 '20 at 01:11
Thank you for working to improve your question. Please review how to create a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). As it is, we cant run this — Conor Neilson, Mar 31 '20 at 01:37
You're checking that an ID is *equal to* whatever that other vector is, but it's unclear whether that's actually what you want—you probably want to filter based on whether an ID is *in* a vector. Also that's a warning, not an error—the difference is that your code likely runs but with errors — camille, Mar 31 '20 at 02:33

score 0 · Answer 1 · answered Mar 31 '20 at 03:34

0

Use %in% operator instead of ==

filtered_df = df %>% 
  dplyr::filter(ID %in% c(2983, 3413,  1266, 3049, 1237,[...],  1002, 1003, 4001))

answered Mar 31 '20 at 03:34

Nikhil Gupta

1,436
12
15

score -1 · Answer 2 · answered Mar 31 '20 at 00:38

-1

Won't split do the trick? E.g.

set.seed(123)
df = data.frame(id = sample(1:10, 100, replace = TRUE),
                data = rnorm(100))
split(df, df$id)

answered Mar 31 '20 at 00:38

James Curran

1,274
7
23

It's unclear what the OP is trying to do exactly, but it doesn't seem like this is it from their description – camille Mar 31 '20 at 02:35
Quite possibly @camille but the initial description was so vague it could have been :-) I see more information has been added now. – James Curran Mar 31 '20 at 07:49

Need help Subsetting a Data frame in R

2 Answers2