149

I'm setting up some goals in Google Analytics and could use a little regex help.

Lets say I have 4 URLs

http://www.anydotcom.com/test/search.cfm?metric=blah&selector=size&value=1
http://www.anydotcom.com/test/search.cfm?metric=blah2&selector=style&value=1
http://www.anydotcom.com/test/search.cfm?metric=blah3&selector=size&value=1
http://www.anydotcom.com/test/details.cfm?metric=blah&selector=size&value=1

I want to create an expression that will identify any URL that contains the string selector=size but does NOT contain details.cfm

I know that to find a string that does NOT contain another string I can use this expression:

(^((?!details.cfm).)*$)

But, I'm not sure how to add in the selector=size portion.

Any help would be greatly appreciated!

Chris Stahl
  • 1,712
  • 3
  • 14
  • 16

5 Answers5

199

This should do it:

^(?!.*details\.cfm).*selector=size.*$

^.*selector=size.*$ should be clear enough. The first bit, (?!.*details.cfm) is a negative look-ahead: before matching the string it checks the string does not contain "details.cfm" (with any number of characters before it).

Kobi
  • 135,331
  • 41
  • 252
  • 292
  • 9
    FYI, check out http://www.regexr.com/ for a nice way to test these expressions out. – Joshua Pinter Apr 08 '14 at 14:23
  • Always forget about negative lookahead and it's so useful – Alexei Blue Feb 20 '18 at 15:35
  • `"http://www.anydotcom.com/test/search.cfm?metric=blah&selector=sized&value=1" =~ /^(?!.*details\.cfm).*selector=size.*$/ #=> 0` is incorrect. (Note the string contains `"...selector=sized..."`.) Also, why `.*$` at the end? – Cary Swoveland Dec 12 '18 at 01:02
4
^(?=.*selector=size)(?:(?!details\.cfm).)+$

If your regex engine supported posessive quantifiers (though I suspect Google Analytics does not), then I guess this will perform better for large input sets:

^[^?]*+(?<!details\.cfm).*?selector=size.*$
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • This assumes `selector=size` is always before `details.cfm`, which isn't the case in the last url. – Kobi Jun 01 '10 at 20:34
  • Just to clear this up, it wasn't me. I can't see why someone would down-vote two answers here, they are both correct. – Kobi Jun 01 '10 at 20:47
  • @Kobi: This should have been a look-ahead, corrected. Oh and by the way, I did not suspect it was your down-vote. – Tomalak Jun 01 '10 at 20:48
3

regex could be (perl syntax):

`/^[(^(?!.*details\.cfm).*selector=size.*)|(selector=size.*^(?!.*details\.cfm).*)]$/`
djipko
  • 77
  • 3
1

There is a problem with the regex in the accepted answer. It also matches abcselector=size, selector=sizeabc etc.

A correct regex can be ^(?!.*\bdetails\.cfm\b).*\bselector=size\b.*$

Explanation of the regex at regex101:

enter image description here

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
  • 2
    While you are not wrong, the regex as originally accepted met my need as your examples did not exist in the set of possible strings. – Chris Stahl Apr 01 '21 at 01:53
  • I think it would have better to have just left a comment on the selected answer saying that word boundaries are needed and give your example as to why. For one, it's a small thing, possibly an oversight. Moreover, anyone looking at the selected answer might not see your answer but they would see comments (as would @Kobi). – Cary Swoveland Jul 29 '23 at 13:55
0

I was looking for a way to avoid --line-buffered on a tail in a similar situation as the OP and Kobi's solution works great for me. In my case excluding lines with either "bot" or "spider" while including ' / ' (for my root document).

My original command:

tail -f mylogfile | grep --line-buffered -v 'bot\|spider' | grep ' / '

Now becomes (with -P perl switch):

tail -f mylogfile | grep -P '^(?!.*(bot|spider)).*\s\/\s.*$'
J. Scott Elblein
  • 4,013
  • 15
  • 58
  • 94
roon
  • 11
  • 1