0

I have below html content, I want extract the Id only like 31673 31672 3166 316 using regular expression.

<a href="/CaseMgrTesting/Pat/Summary/31673">31673</a>
<a href="/CaseMgrTesting/Pat/Summary/31672">31672</a>
<a href="/CaseMgrTesting/Pat/Summary/3166">3166</a>
<a href="/CaseMgrTesting/Pat/Summary/316">316</a>

I create regular expression like below, unfortunately it only return 31673 31672. I also want remove hard code like href="/CaseMgrTesting/Pat/Summary/ and \d\d\d\d\d ,Anybody can give me correct regular expression will be greate appreciate.

(?<=<a\shref="/CaseMgrTesting/Pat/Summary/\d\d\d\d\d">).*(?=</a>)
Brad Larson
  • 170,088
  • 45
  • 397
  • 571
Deep in Development
  • 497
  • 2
  • 8
  • 24
  • 1
    Simple: you don't. You would use an HTML parser. – PeeHaa Dec 19 '12 at 18:54
  • 4
    Are you trying to use regex to parse html? If so, you might want to read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – CBredlow Dec 19 '12 at 18:55
  • 4
    Every time you use regexes for HTML parsing, another web developer feels the sudden urge to weep silently in the corner for seven years straight. – John Dvorak Dec 19 '12 at 18:56
  • 1
    Also, isn't trying to match up to five digits exactly? I'm still rusty on regex to talk too much about it. That might explain why you are only getting the five ones. – CBredlow Dec 19 '12 at 19:05
  • Like @CBredlow said: `(?<=).*(?=)` – Dio F Dec 19 '12 at 20:28
  • Hi Dio, I tried your answer, it is not working. \d+ seems can not instead \d\d\d\d\d in an positive look behind, do not know way. – Deep in Development Dec 19 '12 at 20:50

3 Answers3

1

Your one-stop answer is Html Agility Pack. This nifty must-have allows you to approach HTML by node. Learn it. Live it. Love it.

Wim Ombelets
  • 5,097
  • 3
  • 39
  • 55
  • Thanks Wimbo!!! I believe "Html Agility Pack" is a very good way to extract data within html code. I will learn it. Actually, I pull out above question purpose is to learn Regular Expression which is make me headache but very powerfully. I hate Regular Expression many years but recently I found my genius ex-manger's code using it, 3 lines validation code cover very very complex logic. – Deep in Development Dec 19 '12 at 19:27
1
<a .*?>(.*)</a>

use this regex for this question. Its simple one try it.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
regex
  • 11
  • 1
0

Use this (an updated answer of regex):

<a .*?>(.*?)</a>

The important piece of this is the ? after the *. This will make the .* (match all) non-greedy, else you will have one match at most.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325