3

I'm trying to create rules for a sentence that contains "dog" but not "cat". I would like the function to return FALSE since the string contains both "dog" and "cat".

Using negation:

grepl("cat.*[^dog]", "asdfasdfasdf cat adsfafds dog", perl=T)

Using negative lookahead:

grepl("cat.*(?!dog)", "asdfasdfasdf cat adsfafds dog", perl=T)

Using str_detect function in the stringr package

require(stringr)
str_detect("asdfasdfasdf cat adsfafds dog", "cat.*(?!dog|$)")

All these three methods return true.

matsuo_basho
  • 2,833
  • 8
  • 26
  • 47
  • You realize `cat.*[^dog]` will fail on the string `cat foobarbaz god`, or `cat foobarbaz odg`, etc. The reason is because `[^]` will match any *character* but the ones inside, **not any word but the one inside** – Kaspar Lee Apr 26 '16 at 16:25
  • Also, they should return true. The reason being is that they will be true if the Negative Lookahead matches. If you want them to be false, just remove the negative and make it a normal group. – Kaspar Lee Apr 26 '16 at 16:27
  • @ Druzion, you mean `grepl("cat.*(?=dog)", "asdfasdfasdf cat adsfafds dog", perl=T)` ? Well that just returns true, since it's checking whether the string has cat followed by dog in it. – matsuo_basho May 02 '16 at 13:06
  • No. That is a positive lookahead, it will check if dog **does exist**. Use a negative lookahead: `cat.*(?!dog)`. I know you have already done this, I just wanted to point out *why* the first way would not work. – Kaspar Lee May 02 '16 at 14:29

2 Answers2

2

You can use this regex to find strings that contain cat but not dog:

^((cat((?!dog).)*)|(((?!dog).)*?cat((?!dog).)*)+)$

It's based on the answer here. It takes into account that dog can come before or after cat.


The problem with ALL of your solutions is that cat.* will find catand then .* will eat up EVERYTHING, including dogs.

Also, you forgot to handle the cases where dog comes before cat.

As Druzion points out, char classes are not the way to go.

Community
  • 1
  • 1
Laurel
  • 5,965
  • 14
  • 31
  • 57
  • that's quite an intricate line, but seems to work for my specific purpose. I'm dissecting it right now. – matsuo_basho May 02 '16 at 16:14
  • it appears that just the 2nd part of your solution gets us where we need: `grepl("^(((?!dog).)*?cat((?!dog).)*)+$", "dog cat asdfadsfad", perl=T)` gets us where we need – matsuo_basho May 02 '16 at 17:45
  • why do we need the ^ and $ anchors in that line. According to the link you provided, it's to make sure the entire input is consumer. How would regex behave without the anchors? – matsuo_basho May 05 '16 at 15:24
  • @matsuo_basho Try matching it against `catdogcatdog` with and without anchors and you'll see. `:)` – Laurel May 05 '16 at 15:34
  • well I see it doesn't catch it: returns True when matching against `catdogcatdog` when I eliminate the anchors (should return F). But I don't understand what is happening. – matsuo_basho May 05 '16 at 17:47
  • 1
    @matsuo_basho WIthout the anchors, it can find a substring that matches. You can use https://regex101.com/ for a visualization. – Laurel May 05 '16 at 18:00
1

A simple solution will be to create a function to check :-

i) If the string contains both cat and dog, then return FALSE

ii) otherwise, return TRUE

R Code

cat_dog <- function(x) { if (length(grep("(?=.*cat)(?=.*dog)", x, perl = TRUE)) != 0) {return(FALSE)} else {return(TRUE)} }

Updated Code

cat_dog <- function(x) { if (length(grep("(?=.*dog)", x, perl = TRUE) != 0)) {if (length(grep("(?=.*cat)", x, perl = TRUE)) != 0) {return(FALSE)} else {return(TRUE)}} else {return(FALSE)}}

Ideone Demo

rock321987
  • 10,942
  • 1
  • 30
  • 43
  • just noticed something about this line that does not suit my purpose. I phrased the question with "dog" and "cat" as particular words. However, I of course need this to be dynamic. Thing is, the above script still yields true if we have neither "dog" nor "cat" but other terms in the string. For example "asdfadsf giraffe adsfa gorilla" will yield true because those words aren't found in the string. – matsuo_basho May 02 '16 at 15:36
  • @matsuo_basho ok I misinterpreted the last part..I assumed that if there is neither cat nor dog then also it will be accepted – rock321987 May 02 '16 at 16:43
  • I really like your solution, since it is so elegant..... maybe there's a modification on it that would still achieve my aim. – matsuo_basho May 02 '16 at 17:16
  • @matsuo_basho sorry for late reply..updated the code..not the best one..but it will do the trick..you can say that this can be also done without regex – rock321987 May 02 '16 at 18:18