I have the following regex rule:
'/((f|ht)tp)(.*?)(.gif|.png|.jpg|.jpeg)/'
It works great, but I don't want it to match anything that is preceded by a newline and 4 or more spaces, that means something like this:
"\n "
How can do this?
I have the following regex rule:
'/((f|ht)tp)(.*?)(.gif|.png|.jpg|.jpeg)/'
It works great, but I don't want it to match anything that is preceded by a newline and 4 or more spaces, that means something like this:
"\n "
How can do this?
I have added a negative lookahead anchored at the beginning of the line. It checks for the existence of a newline character followed by 4 or more whitespace characters. If this condition exists the match will fail.
'/^(?!\n\s{4,}).*((f|ht)tp)(.*?)(.gif|.png|.jpg|.jpeg)/'
You don't need to include the linefeed itself in the lookahead, just use the start anchor (^
) in multiline mode. Also, since \s
can match all kinds of whitespace including linefeeds and tabs, you're better off using a literal space character:
'/^(?! {4}).*(f|ht)tp(.*?)(.gif|.png|.jpg|.jpeg)/m'
Speaking of tabs, they can be used in place of the four spaces to create code blocks here on SO, so you might want to allow for that as well:
'/^(?! {4}|\t).*(f|ht)tp(.*?)(.gif|.png|.jpg|.jpeg)/m'
Finally, if you want the regex to match (as in consume) only the URL, you can use the match-start-reset operator, \K
. It acts like a positive lookbehind, without the fixed-length limitation:
'/^(?! {4}|\t).*?\K(f|ht)tp(.*?)(.gif|.png|.jpg|.jpeg)/m'