0

I have some long HTML text, something like this:

/*stuff*/
<a href="some/link.html">Link</a>
/*stuff*/

How can I crop this so that I get only the some/link.html text?

Matthew Strawbridge
  • 19,940
  • 10
  • 72
  • 93
Alex
  • 10,869
  • 28
  • 93
  • 165
  • 3
    Using a regular expression [might not be the best strategy](http://stackoverflow.com/q/1732348/464709) here. Maybe you could use an HTML parser instead? – Frédéric Hamidi Dec 09 '11 at 15:01
  • 1
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags ( I alway dream of citating this one , my preferred is 'Even Jon Skeet cannot parse HTML using regular expressions' ). Strange enough, one of the example on MSDN about Regex that I remember is about extracting links... – Felice Pollano Dec 09 '11 at 15:02

3 Answers3

2

Consider to have a look at the Html Agility Pack.

Felice Pollano
  • 32,832
  • 9
  • 75
  • 115
0
MatchCollection matches = Regex.Matches(html, @"(?<=<a\s+href="").*?(?="">)");

should do the trick.

Note that I am using the pattern (?<=prefix)find(?=suffix) with:
prefix = <a\s+href="
find = .*?
suffix = ">

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
-1

using jquery you can do the following:

var pageNum = $("a#specificLink").attr("href").match(/page=([0-9]+)/)[1];

and in .net c# this tutorial might guide you in the right direction.

Community
  • 1
  • 1
Andres
  • 2,013
  • 6
  • 42
  • 67
  • You ar epointing in the wrong direction: the tutorial you propose will match a link contained in comments, that is wrong. – Felice Pollano Dec 09 '11 at 15:07
  • 1
    This is not a jquery but C# question – parapura rajkumar Dec 09 '11 at 15:07
  • @parapurarajkumar it doesn't mean he can't use jquery to achieve the same, I've posted questions in c# but accepted any solution as long as the end result works. And Match m2 = Regex.Match(value, @"href=\""(.*?)\""" that line in that tutorial wouldn't work?? It's a matter of reading not assuming. – Andres Dec 09 '11 at 15:12