18

Let's say I have the following string:

this is a test for the sake of testing. this is only a test. The end.

and I want to select this is a test and this is only a test. What in the world do I need to do?

The following Regex I tried yields a goofy result:

this(.*)test (I also wanted to capture what was between it)

returns this is a test for the sake of testing. this is only a test

It seems like this is probably something easy I'm forgetting.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Ben Lesh
  • 107,825
  • 47
  • 247
  • 232

4 Answers4

40

The regex is greedy meaning it will capture as many characters as it can which fall into the .* match. To make it non-greedy try:

this(.*?)test

The ? modifier will make it capture as few characters as possible in the match.

Andy E
  • 338,112
  • 86
  • 474
  • 445
  • Thanks... that's what I thought. I tested that out on a regex tester and it works. so the app (EditPlus) I'm using to do some find and replace magic apparently doesn't recognize the ? quantifier. – Ben Lesh Jan 15 '10 at 20:17
  • As per my answer, you might not get perfect results if "this" and "test" are embedded in other words. Do consider looking into it, if that might be an issue. – Platinum Azure Jan 15 '10 at 20:19
9

Andy E and Ipsquiggle have the right idea, but I want to point out that you might want to add a word boundary assertion, meaning you don't want to deal with words that have "this" or "test" in them-- only the words by themselves. In Perl and similar that's done with the "\b" marker.

As it is, this(.*?)test would match "thistles are the greatest", which you probably don't want.

The pattern you want is something like this: \bthis\b(.*?)\btest\b

Platinum Azure
  • 45,269
  • 12
  • 110
  • 134
4

* is a greedy quantifier. That means it matches as much as possible, i.e. what you are seeing. Depending on the specific language support for regex, you will need to find a non-greedy quantifier. Usually this is a trailing question mark, like this: *?. That means it will stop consuming letters as soon as the rest of the regex can be satisfied.

There is a good explanation of greediness here.

Ipsquiggle
  • 1,814
  • 1
  • 15
  • 25
2

For me, simply remove /g worked.

See https://regex101.com/r/EaIykZ/1

Meloman
  • 3,558
  • 3
  • 41
  • 51
  • My regex engine doesn't have a /g to remove. Is there another solution to this problem? – Calion Sep 04 '22 at 03:57
  • Have a look at this https://stackoverflow.com/questions/2503413/regular-expression-to-stop-at-first-match @Calion – Meloman Sep 05 '22 at 06:26