7

I checked different posts on this, but still couldn't figure out why this is not working:

c=c("HI","NO","YESS")
grep("YES",c,fixed=T)
[1] 3

If I am using fixed = T, why I am still getting a results when there is no exact match for "YES". I want only exact matches like when I use grep -w in bash.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
GabrielMontenegro
  • 683
  • 1
  • 6
  • 21
  • 6
    For exact matches simply don't use regex. Just use `==` or `%in%`. In your case there was and exact match of "YES" and everything else was ignored. `fixed = TRUE` just tells `grep` there is no regular expression in the `pattern`. – David Arenburg Mar 01 '16 at 14:54
  • 2
    `fixed=TRUE` means that the `pattern` shouldn't be considered a `regex`, but just as it is. Your vector contains the substring `YES` and so the match. It doesn't mean that the string must be **exactly** equal to the pattern, but it suffices that it **contains** the pattern. – nicola Mar 01 '16 at 14:54
  • Oh I see. Is there a way to tell R to find the exact match? An equivalent of `grep -w`? – GabrielMontenegro Mar 01 '16 at 14:55
  • 5
    Oh sure. Just `which(c=="YES")`. – nicola Mar 01 '16 at 14:55

2 Answers2

5

This just means that you're matching a string rather than a regular expression, but the string can still be a substring. If you want to match exact cases only, how about

> x=c("HI","NO","YESS") #better not to name variables after common functions
> grep("^YES$",x,fixed=F) 
integer(0) 

Edit per @nicola: This works b/c ^ means beginning and $ end of string, so ^xxxx$ forces the entire string to match xxxx.

Philip
  • 7,253
  • 3
  • 23
  • 31
  • 2
    No need of `regex` based functions (like `grep`). A simple `which(c=="YES")` is to be preferred in this case. – nicola Mar 01 '16 at 14:57
  • 2
    Agree; post as answer. But OP is using `grep`, titled question "understanding grep," and might like to know more about how it works and how to modify it for more cases. – Philip Mar 01 '16 at 15:02
  • That was already expressed in the comments. I agree that your post can be useful; I'd add a brief explanation of why it works (i.e. explain what `^` and `$` do). – nicola Mar 01 '16 at 15:06
0

Best solution seems to be using \b as a word boundary, like grep("\\bYES\\b",x). In other cases you have limitations, e. g. if you had c=c("HI","NO","YES S") or c=c("HI","NO","YES,S"), ^YES$ or which(c=="YES") would not help you to find "YES" in those strings.

Found the solution here: https://stackoverflow.com/a/26813671/14590183