0

I am trying to find all of the links in source code on a website, could anyone tell me the expression i would need to put in my Regex to find these?


Duplicate of (among others): Regular expression for parsing links from a webpage?

Google finds more: html links regex site:stackoverflow.com

Community
  • 1
  • 1
xoxo
  • 567
  • 7
  • 25

1 Answers1

-3

I'm not certain how these would translate to C# (I haven't done any development in C# myself yet), but here's how I might do it in JavaScript or ColdFusion. It might give you an idea about how you want to do it in C#.

In JavaScript I think this would work:

rex = /.*href="([^"]+)"/; 
a = source.replace(rex,'\n$1').split('\n'); 

after which a would be an array containing the links... though I'm not certain if that will work exactly the way I think it will. The idea here is that the replace creates a line-break-delimited list (because you can't have a line-break in a URL) and then you can break apart the list with split() to get your array.

By comparison in ColdFusion you would have to do something slightly different:

a = REMatch('href="[^"]+"',source); 
for (i = 1; i < ArrayLen(a); i++) {
  a[i] = mid(a[i],6,len(a[i])-1); 
} 

Again, I haven't tested it, but rematch returns an array of instances of the expression and then the for-next loop removes the href="" around the actual URL.

Isaac Dealey
  • 365
  • 1
  • 6