3

Thanks for grep using a character vector with multiple patterns, I figured out my own problem as well. The question here was how to find multiple values by using grep function, and the solution was either these:

grep("A1| A9 | A6") 

or

toMatch <- c("A1", "A9", "A6")
matches <- unique (grep(paste(toMatch,collapse="|")

So I used the second suggestion since I had MANY values to search for.

But I'm curious why c() or for loop doesn't work out instead of |. Before I researched the possible solution in stackoverflow and found recommendations above, I tried out two alternatives that I'll demonstrate below:

First, what I've written in R was something like this:

find.explore.l<-lapply(text.words.bl ,function(m) grep("^explor",m))

But then I had to 'grep' many words, so I tried out this

find.explore.l<-lapply(text.words.bl ,function(m) grep(c("A1","A2","A3"),m))

It didn't work, so I tried another one(XXX is the list of words that I'm supposed to find in the text)

for (i in XXX){
  find.explore.l<-lapply(text.words.bl ,function(m) grep("XXX[i]"),m))
    .......(more lines to append lines etc)
   }

and it seemed like R tried to match XXX[i] itself, not the words inside. Why can't c() and for loop for grep return right results? Someone please let me know! I'm so curious :P

oguz ismail
  • 1
  • 16
  • 47
  • 69
prejay10
  • 97
  • 2
  • 8
  • 5
    `grep(c("A1","A2","A3"),m))` doesn't work because `grep` is not vectorized over the `pattern` argument - it has to be a single regular expression. `grep("XXX[i]"),m))` doesn't work because you have quotes around `XXX[i]`, so it's interpreted as a string literal rather than evaluated as an object. – nrussell Apr 27 '15 at 13:58
  • 2
    Can you show some input and output? Have you considered the `Vectorize` function? – A5C1D2H2I1M1N2O1R2T1 Apr 27 '15 at 14:03
  • `grep(c("A1","A2","A3"),m))` is violating the grep syntax. `grep(pattern, x,...`. Pattern is required to be a single string, you supplied a vector of three character strings. Another way to put it is `length(pattern)` should be `1`. Also, `function(m) grep("XXX[i]"),m))` has a misplaced closed paranthesis after `"XXX[i]"`. Again check the documentation for grep and its examples. – Pierre L Apr 27 '15 at 14:08
  • Hi, it shoud be ‘’’grep("A1|A9|A6") ‘’’ – pmkruyen Jan 23 '23 at 08:38

2 Answers2

1

From the documentation for the pattern= argument in the grep() function:

Character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Coerced by as.character to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are allowed except for regexpr and gregexpr.

This confirms that, as @nrussell said in a comment, grep() is not vectorized over the pattern argument. Because of this, c() won't work for a list of regular expressions.

You could, however, use a loop, you just have to modify your syntax.

toMatch <- c("A1", "A9", "A6")

# Loop over values to match
for (i in toMatch) {
    grep(i, text)
}

Using "XXX[i]" as your pattern doesn't work because it's interpreting that as a regular expression. That is, it will match exactly XXXi. To reference an element of a vector of regular expressions, you would simply use XXX[i] (note the lack of surrounding quotes).

You can apply() this, but in a slightly different way than you had done. You apply it to each regex in the list, rather than each text string.

lapply(toMatch, function(rgx, text) grep(rgx, text), text = text)

However, the best approach would be, as you already have in your post, to use

matches <- unique(grep(paste(toMatch, collapse = "|"), text))
Alex A.
  • 5,466
  • 4
  • 26
  • 56
0

Consider that:

XXX <- c("a", "b", "XXX[i]")
grep("XXX[i]", XXX, value=T)
character(0)
grep("XXX\\[i\\]", XXX, value=T)
[1] "XXX[i]"

What is R doing? It is using special rules for the first argument of grep. The brackets are considered special characters ([ and ]). I put in two backslashes to tell R to consider them regular brackets. And imgaine what would happen if I put that last expression into a for loop? It wouldn't do what I expected.

If you would like a for loop that goes through a character vector of possible matches, take out the quotes in the grep function.

#if you want the match returned
matches <- c("a", "b")
for (i in matches) print(grep(i, XXX, value=T))
[1] "a"
[1] "b"

#if you want the vector location of the match
for (i in matches) print(grep(i, XXX))
[1] 1
[1] 2

As the comments point out, grep(c("A1","A2","A3"),m)) is violating the grep required syntax.

Pierre L
  • 28,203
  • 6
  • 47
  • 69