4

I have an issue about removing the groups that contain certain strings in its rows for example if includes .. I would like to achive this without breaking the pipeline. I mean without using any join function.

The example data

vals <- c("good","bad",'ugly',"good","bad.","ugly")

    gr <- gl(2,3)

vals gr
1 good  1
2  bad  1
3 ugly  1
4 good  2
5 bad.  2
6 ugly  2

df <- data.frame(vals,gr)

I tried

library(dplyr)
        df%>%
          filter(!grepl("\\.",vals))

which removes only the row that match the condition. But I want to remove entire gr 2.

 vals gr
1 good  1
2  bad  1
3 ugly  1
4 good  2
5 ugly  2
Alexander
  • 4,527
  • 5
  • 51
  • 98
  • This `df <- data.frame(vals,gr)` should be above the first data frame output you show, otherwise it doesn't make sense (not in chronological order). I suggested this edit but reviewers mistakenly thought it changed anything to your post. – Nakx Apr 27 '20 at 09:18

4 Answers4

8

Maybe something like this:

df %>% group_by(gr) %>% filter(all(!grepl("\\.",vals)))
joran
  • 169,992
  • 32
  • 429
  • 468
  • 1
    @Tjebo The `filter` is operating only on one group at a time, and it enforcing the requirement that all `vals` within the group not contain a period. – joran Feb 21 '18 at 22:16
  • @joran Thanks joran for your elegant solution. By the way regarding the my another post is there any solution yet ?. I tried your last comment but still no luck. all group numbers are different. [special-grouping-number-for-each-pairs](https://stackoverflow.com/questions/48913362/special-grouping-number-for-each-pairs) – Alexander Feb 21 '18 at 22:21
  • @joran I understand the !grepl filter, but I do not understand why 'gr 2' is filtered out. Is this because it evaluates group 1 as TRUE and then group 2 is FALSE?? – tjebo Feb 21 '18 at 22:21
  • @Tjebo No, remember it's operating on each group in isolation. So when the filter is _only_ acting on the observations within each group. – joran Feb 21 '18 at 22:23
  • @joran Ah, the bell rings. Sorry, this took a while to get into my head... oops.. Cheers!! – tjebo Feb 21 '18 at 22:27
2

Another option could be using %in% operator.

df %>% 
 filter(!(gr %in% unique(ifelse(grepl("\\.",vals),gr,NA) )))

#  vals gr
#1 good  1
#2  bad  1
#3 ugly  1
MKR
  • 19,739
  • 4
  • 23
  • 33
  • IIUC, the OP has requested to remove the entire group in which the certain string occurs. Your solution does only remove the specific row, the other members of `gr` 2 have not been removed. – Uwe Feb 21 '18 at 23:44
  • @Uwe Thanks to point it out. I have corrected my mistake. Actually I was working on solution using `mutate` and `mapply` something like `df %>% mutate(InValidGroup = ifelse(mapply(grepl, "\\.",vals),gr,NA) ) %>% filter(!(gr %in% unique(InValidGroup))) %>% select(-InValidGroup)` and messed up while adding answer. Your ans looks good too. – MKR Feb 22 '18 at 04:54
2

The OP has requested to remove the entire group when one of the group members contains a certain string in vals - without breaking the pipe.

The OP explicitely has stated: I mean without using any join function.

However, I believe using an anti-join does not break the pipe:

library(dplyr)
data.frame(vals, gr) %>% 
  anti_join(., filter(., grepl("\\.",vals)), by = "gr")
  vals gr
1 good  1
2  bad  1
3 ugly  1
Uwe
  • 41,420
  • 11
  • 90
  • 134
1

Here is one option in base R with subset and table

subset(df, gr %in% names(which(!table(gr, grepl("\\.", vals))[,2])))
#  vals gr
#1 good  1
#2  bad  1
#3 ugly  1
akrun
  • 874,273
  • 37
  • 540
  • 662