Regex start matching from last occurence

Question

I have a html code:

<a href="localhost/1153,Test">  <img src="test.jpg">
(...)
<a href="localhost/1191,Test8">  <img src="test2.jpg">
(...)
<a href="localhost/9838,Test">  <img src="13.gif">
(...)

want to match a value before gif, in this example 9838. No matter how I try, I end up with first match (1153). How to match "backwards" from a known pattern?

try look ahead http://www.regular-expressions.info/lookaround.html — Udy, Jul 23 '13 at 10:47
possible duplicate of [Using regular expressions to parse HTML: why not?](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) — tripleee, Jul 23 '13 at 10:58
@tripleee Thanks for your contribution. Can you tell me how to get this value in C# from a totally malformed html? I don't want to parse HTML, I just want to extract this particular number... — Kassan, Jul 23 '13 at 11:05

score 0 · Answer 1 · answered Jul 23 '13 at 10:47

0

In C#, you can do

Match theMatch = Regex.Match(source, pattern, RegexOptions.RightToLeft);

which would start matching from the end.

answered Jul 23 '13 at 10:47

Anton

1,346
11
31

score 0 · Answer 2 · answered Jul 23 '13 at 10:48

0

how about?

<a href="localhost\/(\d+),Test\d*">  <img src="\w+\.gif">

answered Jul 23 '13 at 10:48

Lodewijk Bogaards

19,777
3
28
52

score 0 · Accepted Answer · answered Jul 23 '13 at 11:09

0

\d*(?=,.*\.gif)

This matches the first occurence of digits before

[comma][other text][.gif]

You can test this here.

answered Jul 23 '13 at 11:09

Srb1313711

2,017
5
24
35

Thank you, can you explain why I get 6 matches in RegexBuddy but only one in RegExr? – Kassan Jul 23 '13 at 11:16
It probably matches across newlines, so the first number is matched on the last occurrence. Use a nongreedy match or exclude newlines from the skip expression. – tripleee Jul 23 '13 at 11:19
Im sorry im not sure but try this two links aswell both regex testers and both get same result: http://www.myregextester.com/index.php#sourcetab, http://regexpal.com/ – Srb1313711 Jul 23 '13 at 11:21
Tripleee- of course you're right, it's too hot to focus. Srb- thank you for a elegant solution: it looks great and works flawlessly. It's time to learn about lookaround, thank you! – Kassan Jul 23 '13 at 11:30
No worries glad I could be of help – Srb1313711 Jul 23 '13 at 11:31

tripleee · Answer 4 · 2013-07-23T11:33:11.010

0

If you insist on using regex for this, it would be good to have some boundary conditions; what more or less similar pieces of text should not be matched?

Having said that, I might go for something like

(\d+)[^"\d]+"[^"]+"[^"\d]+\d+\.gif

and extract the first parenthesized submatch (assuming [^\d] matches neither newlines nor digits in your language; I am not too familiar with it).

edited Jul 23 '13 at 11:33

answered Jul 23 '13 at 11:26

tripleee

175,061
34
275
318

Regex start matching from last occurence

4 Answers4

Linked