0

I'm having a little problem with a VB.NET scraper, it's supposed to get all links of a html string, which I have already downloaded, and the links are there (I have checked), so it must be something with my regex string.

My regex string: <a.*?href=""(.*?)"".*?>(.*?)</a>

This works for some sites, but for others it does not.

Here are examples from the HTML source that match and don't match.

Working:

<a href="http://domain.com" rel="nofollow" onmousedown="return clk('25936','3')" target="_blank">/a>

Not working:

<a href='http://domain.com' target="_blank" ><font size=2><b>text</b></a>

Could it be because of the " and ' ?

jwpfox
  • 5,124
  • 11
  • 45
  • 42
Anders
  • 513
  • 2
  • 10
  • 32

1 Answers1

2

Check with following RegExp:

<a.*?href=[",'](.*?)[",'].*?><\/a>

You are using double quotes 2 times. since a tag's href will be used with single and double quotes you have to check with both.