0

I'm trying to write a regex that will match a string that contains a certain substring, but fails if it also contains different substring. I've found this answer, but I'm not sure how to get it to work for my needs. In the interest of being specific as possible:

  • Yes, it has to do this as part of the expression. I do not have access to the code that will be processing this.
  • Yes, it needs to be one expression.
  • It needs to work with PHP's regex flavor. I'm pretty sure it's being evaluated using preg

To give an idea of what I'm trying to do, I have a set of URLs I'm trying to filter. URLs that have "/somedir" in them I want to match, but I don't want it to match if it also has "somestring" in the URL.

So,

  • www.somesite.com/somedir/index.html
  • www.somesite.com/somedir/somotherdir/index.html
  • www.somesite.com/somedir/somepage.html

would all match, but,

  • www.somesite.com/somedir/somestring.html
  • www.somesite.com/somedir/somestring/index.html

would both fail.

Community
  • 1
  • 1
AWarnock
  • 69
  • 2
  • 7
  • 1
    Why regex? A simple combination of `strpos` would work. – Tchoupi Mar 07 '13 at 17:37
  • *"I'm pretty sure it's being evaluated using preg"* - you should know before asking because this differs (it's highly likely you're right, but well, find out ;)) - Also the code where the pattern is used is necessary as it can differ how a regular expression is used. Also you should outline what you've tried so far. E.g. mock your own code, provide the data to run through, output results, make the pattern a variable and try. – hakre Mar 07 '13 at 17:39
  • possible duplicate of http://stackoverflow.com/questions/2953039/regular-expression-for-a-string-containing-one-word-but-not-another – Crisp Mar 07 '13 at 17:41
  • Thanks Crisp, that was exactly what I was looking for. It's moot now, but I'm working in a CMS and the bit that assigns modules to pages using the URL uses regular espressions. It's also the reason I don't know whether it used preg or not, that and I don't have time to go digging through a dozen files to figure it out. Again, thanks for the help. – AWarnock Mar 07 '13 at 17:55

1 Answers1

0

You need a regex that will accept a certain pattern only if it is not surrounded by another pattern:

~
    (?(DEFINE)
        (?<ACCEPT> must-contain-pattern)
        (?<REFUSE> must-not-contain-pattern)
    )

    ^
        (?:(?!(?&REFUSE)).)*
            (?&ACCEPT)
        (?:(?!(?&REFUSE)).)*
    $
~ux

In the DEFINE block define the ACCEPT and REFUSE pattern according to your needs and this should work.

Edit: The pattern from above tailored for the case of yours by defining the two named sub-patterns:

~
    (?(DEFINE)
        (?<ACCEPT> \Q/somedir\E)
        (?<REFUSE> \Qsomestring\E)
    )

    ^
        (?:(?!(?&REFUSE)).)*
            (?&ACCEPT)
        (?:(?!(?&REFUSE)).)*
    $
~ux
hakre
  • 193,403
  • 52
  • 435
  • 836
  • That seems like a lot of work when the answer to the question Crisp linked to works. Besides that, I could not use multiple expressions. – AWarnock Mar 07 '13 at 18:09
  • lot of work? multiple expressions? Not at all because of the defines and this is a single expression, not multiple ones. You probably don't understand how it works, I give you an example by editing the answer. It's meant you can easily adopt it. Hopefully it's more clear then. – hakre Mar 07 '13 at 18:17