2

I using a combination of dplyr and a for loop to subset a database. I want to perform the first operation on the whole dataset. My loop with regular expression fails. What could be a solution?

library(dplyr)
df <- data.frame(values=c("a","b","c"))
select <- c("*","a")

for (i in 1:length(select)){
  print(df %>% filter(values %in% select[i]) %>% summarise(n()))}

Desired result:

  n()
1   3
  n()
1   1
MCS
  • 1,071
  • 9
  • 23
  • Ok, this works df <- data.frame(values=c("a","b","c")) select <- c("*","a") for (i in 1:length(select)){ print(df %>% filter(grepl(select[i], values)) %>% summarise(n()))} – MCS Sep 21 '19 at 04:42

2 Answers2

0

An option would be to paste the . at the beginning and then with str_detect or grepl filter the rows

library(dplyr)
library(stringr)
df %>%
    filter(str_detect(values, str_c(".", select[1]))) %>%
    summarise(n = n())
# n
#1 3

Or instead of using *, specify as . in 'select' as . matches any character while * implies 0 or more characters of the character preceding.

select <- chartr('*', '.', select)
for (i in seq_along(select)){ print(df %>% 
            filter(str_detect(values, select[i])) %>%
            summarise(n()))}
#   n()
#1   3
# n()
#1   1

This would work with both grepl and str_detect while the OP's original string * works only with grepl


Another option if we are using a fixed match with %in% would be to create a logical condition

for (i in seq_along(select)){ print(df %>% 
             filter(if(!select[i] %in% values) TRUE else values %in% select[i]) %>%
            summarise(n()))}

# n()
#1   3
#  n()
#1   1
akrun
  • 874,273
  • 37
  • 540
  • 662
0

In base R, we can use lapply for each value in select with grepl to subset rows which match the condition

lapply(select, function(x) subset(df, grepl(x, values)))

#[[1]]
#  values
#1      a
#2      b
#3      c

#[[2]]
#  values
#1      a

You can also consider to add word boundaries to the pattern in select if you want to match the word exactly and don't want "a" to match with "ab" etc.

lapply(paste0("\\b", select, "\\b"), function(x) subset(df, grepl(x, values)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213