Unable to filter sampled dataframe - R

Question

I have dataframe that is a result of

new_df <- dat %>% group_by(ID) %>% sample_frac(0.25,replace = FALSE)

data frame looks like this:

Ad.ID    ID
1234     deroy
2345     deroy
4567     deroy
34567    mrroy
13467    mrroy
00024    ronde
32243    ronde

trying to filter out rows specific to certian IDs i.e. for deroy or mrroy but unable to.

exp <- new_df[new_df$ID %in%"deroy",]

using grepl was able to do it for one ID, but if I wanted to do it for two or three it doesn't work.

Please do not mark as duplicate because I Have tried all suggestions from here and few other places.

I maybe missing out some basics. Any help is appreciated.

Adding dput:

structure(list(Ad.ID = c(75856740L, 75899591L, 75904815L, 75911256L, 
75911261L, 75911267L, 75911277L, 75911277L, 75911291L, 75911302L, 
75905790L, 75905815L, 75905818L, 75910661L, 75914385L, 75902382L, 
75902383L, 75902384L, 75902386L, 75902391L), ID = c("deroy                         
", 
"deroy                         ", "deroy                         ", 
"deroy                         ", "deroy                         ", 
"deroy                         ", "deroy                         ", 
"deroy                         ", "deroy                         ", 
"deroy                         ", "deroy                         ", 
"deroy                         ", "deroy                         ", 
"deroy                         ", "jishuroy                      ", 
"jishuroy                      ", "jishuroy                      ", 
"jishuroy                      ", "jishuroy                      ", 
"jishuroy                      ")), .Names = c("Ad.ID", "ID"), row.names = 
c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 16L, 
17L, 18L, 19L, 20L, 21L), class = "data.frame")

What do you mean by unable to? Do you get an error? Because If I test your `new_df[new_df$ID %in%"deroy",]` code it returns the rows with "deroy" — phiver, Jul 08 '18 at 14:01
@phiver after running the code my console gives # A tibble: 0 x 2 # Groups: ID [0] # ... with 2 variables: Ad.ID , ID — Mr Rj, Jul 08 '18 at 14:08
Since you are using dplyr see what happens if you do `new_df %>% filter(ID == "deroy")`. — phiver, Jul 08 '18 at 14:13
All options on your example data work as they should. I think you have to check spelling mistakes / trailing spaces etc in your data. Or if you add the outcome of `dput(head(dat, 20))` to your question. Because now I can't see what the issue is. — phiver, Jul 08 '18 at 15:01

score 1 · Accepted Answer · answered Jul 08 '18 at 15:26

Looking at your data, your ID column is 30 long. You have a lot of empty spaces behind every word. Before you continue, first clean that up.

nchar(new_df$ID[1])
30

Using dplyr:

new_df %>% 
  mutate(ID = gsub(" ", "", ID)) %>%
  filter(ID == "jishuroy")

     Ad.ID       ID
1 75914385 jishuroy
2 75902382 jishuroy
3 75902383 jishuroy
4 75902384 jishuroy
5 75902386 jishuroy
6 75902391 jishuroy

Using base R:

new_df$ID <- gsub(" ", "", new_df$ID)
new_df[new_df$ID == "jishuroy", ]
      Ad.ID       ID
16 75914385 jishuroy
17 75902382 jishuroy
18 75902383 jishuroy
19 75902384 jishuroy
20 75902386 jishuroy
21 75902391 jishuroy

Thanks @phiver mutate did not work, but base R did. Thank you again!! — Mr Rj, Jul 08 '18 at 19:00

score 0 · Answer 2 · answered Jul 08 '18 at 15:05

0

try :

df1 = new_df %>% filter(id == 'deroy')
df2 = new_df %>% filter(id == 'mrroy')
df3 = new_df %>% filter(id %in% c('mrroy', 'deroy'))

answered Jul 08 '18 at 15:05

Elpidio Filho

26
1

Thank you!. Still same result. The output is 0 – Mr Rj Jul 08 '18 at 15:09

score 0 · Answer 3 · answered Jul 08 '18 at 17:01

0

One easy option would be to use trimws to remove the leading/lagging spaces in the "ID" column and then use filter

library(dplyr)
new_df %>%
       filter(trimws(ID) == "jishuroy")

and for multiple IDs, use the %in% as in the OP's post

answered Jul 08 '18 at 17:01

akrun

874,273
37
540
662

Unable to filter sampled dataframe - R

3 Answers3