108

For regex what is the syntax for search until but not including? Kinda like:

Haystack:
The quick red fox jumped over the lazy brown dog

Expression:
.*?quick -> and then everything until it hits the letter "z" but do not include z
Brad Turek
  • 2,472
  • 3
  • 30
  • 56
rolling_codes
  • 15,174
  • 22
  • 76
  • 112

3 Answers3

204

The explicit way of saying "search until X but not including X" is:

(?:(?!X).)*

where X can be any regular expression.

In your case, though, this might be overkill - here the easiest way would be

[^z]*

This will match anything except z and therefore stop right before the next z.

So .*?quick[^z]* will match The quick fox jumps over the la.

However, as soon as you have more than one simple letter to look out for, (?:(?!X).)* comes into play, for example

(?:(?!lazy).)* - match anything until the start of the word lazy.

This is using a lookahead assertion, more specifically a negative lookahead.

.*?quick(?:(?!lazy).)* will match The quick fox jumps over the.

Explanation:

(?:        # Match the following but do not capture it:
 (?!lazy)  # (first assert that it's not possible to match "lazy" here
 .         # then match any character
)*         # end of group, zero or more repetitions.

Furthermore, when searching for keywords, you might want to surround them with word boundary anchors: \bfox\b will only match the complete word fox but not the fox in foxy.

Note

If the text to be matched can also include linebreaks, you will need to set the "dot matches all" option of your regex engine. Usually, you can achieve that by prepending (?s) to the regex, but that doesn't work in all regex engines (notably JavaScript).

Alternative solution:

In many cases, you can also use a simpler, more readable solution that uses a lazy quantifier. By adding a ? to the * quantifier, it will try to match as few characters as possible from the current position:

.*?(?=(?:X)|$)

will match any number of characters, stopping right before X (which can be any regex) or the end of the string (if X doesn't match). You may also need to set the "dot matches all" option for this to work. (Note: I added a non-capturing group around X in order to reliably isolate it from the alternation)

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • +1 Really nice answer, unfortunatly doesn't work with `grep`, but this [answer](http://stackoverflow.com/a/5979402/354831) does. – Alexandre Lavoie Jun 10 '13 at 02:58
  • @AlexandreLavoie: Interesting. Why should the other one work and not this one? Both use lookahead assertions. Perhaps it's just because of the `(?:...)` non-capturing group? Does it work with `((?!X).)*`? – Tim Pietzcker Jun 10 '13 at 07:40
  • 1
    Really don't know, I'm not a regex expert nor grep. I was using `grep` to filter requests for only one database from mysql bin transformet in sql. Here is the beast : `grep -Po "(?s)use database_to_keep(.*?)(?=^use)" mysql-bin.000045.sql > filtered.sql` – Alexandre Lavoie Jun 10 '13 at 07:49
  • Looks like a bash conflict since when I hit `Up` key, the last command is not the one I've used : `grep -Po "(?s)use database_to_keep(.*?)(?:(?!^use).)*" mysql-bin.000045.sql > filtered.sql` – Alexandre Lavoie Jun 10 '13 at 07:51
  • @AlexandreLavoie: `(?!^use)` doesn't do what you think. `^` means "match the start of the string". You need to remove it. Also, the `(.*?)` part is completely pointless because it will never match anything but the empty string. – Tim Pietzcker Jun 10 '13 at 07:55
  • The context is that I want to filter from (start of line)use database_name(multiple lines)(start of line)use anoter_database_name. But yes you are right for the `(.*?)` part. – Alexandre Lavoie Jun 10 '13 at 07:57
  • 1
    Good edit, @Tim, just add `$` alternative: replace `.*?(?=X)` with `.*?(?=X|$)` – Wiktor Stribiżew May 05 '17 at 10:31
  • Hi @TimPietzcker, thanks a lot for the help, I'm doing another regex using the pattern you gave: `(?:(?!X).)*` I need this in my case `(?:(?!R\$|\d).)*` match `R$` or a digit but don't put then on the result, but when I exec the regex it end up bringing both of then. This is the full regex I'm trying to exec in Js `/(?:(?!R$|\d).)* R?\$? ([0-9,.]+)/g`. Could you give me a help? After your pattern I need to match a currency. – Fernando Zamperin Mar 13 '18 at 12:16
20

A lookahead regex syntax can help you to achieve your goal. Thus a regex for your example is

.*?quick.*?(?=z)

And it's important to notice the .*? lazy matching before the (?=z) lookahead: the expression matches a substring until a first occurrence of the z letter.

Here is C# code sample:

const string text = "The quick red fox jumped over the lazy brown dogz";

string lazy = new Regex(".*?quick.*?(?=z)").Match(text).Value;
Console.WriteLine(lazy); // The quick red fox jumped over the la

string greedy = new Regex(".*?quick.*(?=z)").Match(text).Value;
Console.WriteLine(greedy); // The quick red fox jumped over the lazy brown dog
Igor Kustov
  • 3,228
  • 2
  • 34
  • 31
0

Try this

(.*?quick.*?)z
Max
  • 19,654
  • 13
  • 84
  • 122
  • 3
    This includes the "z" in the match, which is precisely what the asker wants to avoid. Perhaps the regex is intended to be a term in a '|' alternative, and that alternative regex is used to perform multiple matches. If the "z" is the beginning of a string that would be matched by **another** term in the alternative, then this match will be forfeited because the "z" is already consumed by the current match. – Szczepan Hołyszewski Aug 27 '15 at 11:24