1

Trying to create a regular expression that excludes results of a substring is present.

Data Set:

 http://www.cnn.com/test1
 http://www.cnn.com/test3
 http://www.cnn.com/test5
 http://www.stackflow.com/test4
 http://www.cnn.com/test3
 http://www.cnn.com/test4

exclude:

  • find all cnn.com sites
  • that don't have /test3

Results:

 http://www.cnn.com/test1
 http://www.cnn.com/test5
 http://www.cnn.com/test4
Lacer
  • 5,668
  • 10
  • 33
  • 45

3 Answers3

1

Figured it out: (www.cnn.com)(?!/test3)

kayess
  • 3,384
  • 9
  • 28
  • 45
Lacer
  • 5,668
  • 10
  • 33
  • 45
0

If you want to avoid matching strings like http://www.cnn.com/test/test3 then you can use a negtive lookbehind at the end of the string

cnn\.com.*(?<!test3)$
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
0

I'm guessing this would be fastest:

cnn\.com(?!\/test3)[a-zA-Z0-9-._~:?#@!$&'*+,;=`.\/\(\)\[\]]*

because you restrict the URL to allowed characters only.

Community
  • 1
  • 1
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239