4

I first thought that this answer will totaly solve my issue, but it did not.

I have a string url like this one:

http://www.someurl.com/some-text-1-0-1-0-some-other-text.htm#id_76

I would like to extract some-other-text so basically, I come with the following regex:

/0-(.*)\.htm/

Unfortunately, this matches 1-0-some-other-text because regex are greedy. I can not succeed make it nongreedy using .*?, it just does not change anything as you can see here.

I also tried with the U modifier but it did not help.

Why the "nongreedy" tip does not work?

Guy Coder
  • 24,501
  • 8
  • 71
  • 136
Delgan
  • 18,571
  • 11
  • 90
  • 141
  • Did you try `0-([^0]*)\.htm`? If you do not expect any 0s further in your input, it can solve the problem. – Wiktor Stribiżew Aug 02 '15 at 18:36
  • @stribizhev The regex will not work if my text is `s0me-text` for example. – Delgan Aug 02 '15 at 18:38
  • What about [`0-((?!.*0-).*)\.htm`](https://regex101.com/r/fA7aA1/2) then? It can work for individual strings. Else, you will need a tempered greedy token. – Wiktor Stribiżew Aug 02 '15 at 18:39
  • 1
    I think you are going to have to be more clear about what the values/format can be. So far you have basically shown that you want an arbitrary value nested inside an arbitrary value – CrayonViolent Aug 02 '15 at 18:46
  • @stribizhev @CrayonViolent But do you know why the `.*?` from the other answer does not work here? – Delgan Aug 02 '15 at 18:49
  • for example.. is the "1-0-1-0-" part static? If not, will it always be 4 single digit numbers? etc. Regular expressions only work if you have an actual pattern to match against. If the whole thing is arbitrary then regex isn't going to work – CrayonViolent Aug 02 '15 at 18:49
  • 1
    @Delgan becaue your "non-greedy" solution assumes that it will start on the ending `\.html` and work its way backward in a non-greedy way (so stop on first `0-`). But that's just not how regex engines work. Regex engines match left to right, not right to left. – CrayonViolent Aug 02 '15 at 18:54
  • @CrayonViolent Yes I admit that my example was perhaps imperfect, what puzzled me was the nongreedy workaround which did not work, I better understand why now. Thank you. – Delgan Aug 02 '15 at 18:56
  • 1
    Could just put a [greedy](http://www.regular-expressions.info/repeat.html#greedy) dot `.*` that ᗧ eats up before: `.*0-(.*)\.htm` See [test at regex101](https://regex101.com/r/cM8kC6/1) – Jonny 5 Aug 03 '15 at 08:02
  • @Jonny5 Oh, that is cool, thanks for the tip! – Delgan Aug 04 '15 at 08:50

2 Answers2

3

In case you need to get the closest match, you can make use of a tempered greedy token.

0-((?:(?!0-).)*)\.htm

See demo

The lazy version of your regex does not work because regex engine analyzes the string from left to right. It always gets leftmost position and checks if it can match. So, in your case, it found the first 0-and was happy with it. The laziness applies to the rightmost position. In your case, there is 1 possible rightmost position, so, lazy matching could not help achieve expected results.

You also can use

0-((?!.*?0-).*)\.htm

It will work if you have individual strings to extract the values from.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    That makes sense, regex are left to right so nongreedy tip is usefull for right characters only, this is obvious now, thank you. – Delgan Aug 02 '15 at 18:54
0

You want to exclude the 1-0? If so, you can use a non capturing group:

(?:1-0-)+(.*?)\.htm

Demo

Martin Brandl
  • 56,134
  • 13
  • 133
  • 172