1

I have a html code:

<a href="localhost/1153,Test">  <img src="test.jpg">
(...)
<a href="localhost/1191,Test8">  <img src="test2.jpg">
(...)
<a href="localhost/9838,Test">  <img src="13.gif">
(...)

want to match a value before gif, in this example 9838. No matter how I try, I end up with first match (1153). How to match "backwards" from a known pattern?

Kassan
  • 59
  • 4
  • try look ahead http://www.regular-expressions.info/lookaround.html – Udy Jul 23 '13 at 10:47
  • possible duplicate of [Using regular expressions to parse HTML: why not?](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) – tripleee Jul 23 '13 at 10:58
  • @tripleee Thanks for your contribution. Can you tell me how to get this value in C# from a totally malformed html? I don't want to parse HTML, I just want to extract this particular number... – Kassan Jul 23 '13 at 11:05

4 Answers4

0

In C#, you can do

Match theMatch = Regex.Match(source, pattern, RegexOptions.RightToLeft);

which would start matching from the end.

Anton
  • 1,346
  • 11
  • 31
0

how about?

<a href="localhost\/(\d+),Test\d*">  <img src="\w+\.gif">
Lodewijk Bogaards
  • 19,777
  • 3
  • 28
  • 52
0
\d*(?=,.*\.gif)

This matches the first occurence of digits before

[comma][other text][.gif]

You can test this here.

Srb1313711
  • 2,017
  • 5
  • 24
  • 35
  • Thank you, can you explain why I get 6 matches in RegexBuddy but only one in RegExr? – Kassan Jul 23 '13 at 11:16
  • It probably matches across newlines, so the first number is matched on the last occurrence. Use a nongreedy match or exclude newlines from the skip expression. – tripleee Jul 23 '13 at 11:19
  • Im sorry im not sure but try this two links aswell both regex testers and both get same result: http://www.myregextester.com/index.php#sourcetab, http://regexpal.com/ – Srb1313711 Jul 23 '13 at 11:21
  • Tripleee- of course you're right, it's too hot to focus. Srb- thank you for a elegant solution: it looks great and works flawlessly. It's time to learn about lookaround, thank you! – Kassan Jul 23 '13 at 11:30
  • No worries glad I could be of help – Srb1313711 Jul 23 '13 at 11:31
0

If you insist on using regex for this, it would be good to have some boundary conditions; what more or less similar pieces of text should not be matched?

Having said that, I might go for something like

(\d+)[^"\d]+"[^"]+"[^"\d]+\d+\.gif

and extract the first parenthesized submatch (assuming [^\d] matches neither newlines nor digits in your language; I am not too familiar with it).

tripleee
  • 175,061
  • 34
  • 275
  • 318