170

How do I tell RegEx (.NET version) to get the smallest valid match instead of the largest?

Guy Coder
  • 24,501
  • 8
  • 71
  • 136
Jonathan Allen
  • 68,373
  • 70
  • 259
  • 447

3 Answers3

291

For a regular expression like .* or .+, append a question mark (.*? or .+?) to match as few characters as possible. To optionally match a section (?:blah)? but without matching unless absolutely necessary, use something like (?:blah){0,1}?. For a repeating match (either using {n,} or {n,m} syntax) append a question mark to try to match as few as possible (e.g. {3,}? or {5,7}?).

The documentation on regular expression quantifiers may also be helpful.

Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
DMI
  • 6,843
  • 2
  • 24
  • 25
  • 2
    Line2 "but without matching unless absolutely necessary": What does this mean? – NeoZoom.lua Apr 28 '19 at 12:16
  • Won't this '{0,1}' match nothing because of the 0? Why don't use '{1}' instead? – Kimi Chiu Oct 08 '22 at 15:50
  • Regular expressions are greedy by default, which means they try to match as much as possible. Adding the question mark right after the braces means that it will try to match the fewest possible times, but will still match if it can't avoid it. Just using '{1}' means that it must match exactly once. – DMI Oct 09 '22 at 17:30
  • This is not what lazy matching does. See https://stackoverflow.com/questions/35944441 – BlueRaja - Danny Pflughoeft Jul 28 '23 at 21:09
97

The non-greedy operator, ?. Like so:

.*?
David Hedlund
  • 128,221
  • 31
  • 203
  • 222
73

The non greedy operator does not mean the shortest possible match:

abcabk

a.+?k will match the entire string (in this example) instead of only the last three signs.

I'd like to actually find the smallest possible match instead.

That is that last possible match for 'a' to still allow all matches for k.

I guess the only way to do that is to make use of an expression like:

a[^a]+?k

const haystack = 'abcabkbk';
const paternNonGreedy = /a.+?k/;
const paternShortest = /a[^a]+?k/;

const matchesNonGreedy = haystack.match(paternNonGreedy);
const matchesShortest = haystack.match(paternShortest);

console.log('non greedy: ',matchesNonGreedy[0]);
console.log('shortest: ', matchesShortest[0]);
Jonathan
  • 1,355
  • 14
  • 22
  • 2
    Or search in reverse order, starting at the end, when matches are nested: "(ab(abk)bk)". – LBogaardt Jan 22 '16 at 10:19
  • 7
    @LBogaardt how would one search in reverse order? don't get it – azerafati Jun 07 '16 at 16:06
  • 2
    @LBogaardt Still open question: How would one search in reverse order? Lets say I want to get `cab`. If my input is `caaacab` and I search for `a.*?b` it will return the full string instead of the short match inside. How would I search backwards from the `b`? – C4d Feb 27 '17 at 11:19
  • 3
    Reverse the string, then apply the regex. – Jonathan Allen Nov 19 '17 at 00:03
  • I don't really see how reversing the string would help at all. In the string of the example it would only work because there is only one k, add another k at the end and reversing the string will end up giving you the exact same problem. – Jonathan Aug 16 '18 at 06:49
  • 7
    @C4u Try `c[^cb]*b`, it'll match the shortest path between `c` and `b` – allenyllee Aug 31 '18 at 05:30
  • 7
    This is super helpful. For people like me trying to understand what's going on here the generic form is `START[^START]*?END` (where START and END are your start and end character regexs). It essentially means "match anything from START to END where the in-between characters do not include START again" – derekantrican Aug 21 '19 at 16:19
  • 3
    I suppose this would only work when START is a single character? – Stewart May 04 '20 at 15:15
  • @derekantrican Thanks for this. I was trying to grab the comma-bounded fragment of a sentence that talks about a leaf's venation, but kept getting the fragments before it despite using a lazy search. `,[^,]*?(venation|reticulat)[[:print:]]*?,` worked! – DuckPyjamas Jun 28 '20 at 01:07