0

I want to get all the websites from HTML code. The problem is that I have a regex which takes all the URLs but there needs to be www in the address. What kind of regex I need to use to get the URLs without www in the content?

update: The regex I am using is:

string anchorPattern = 
  @"(?<Protocol>\w+)://(?<Domain>[\w@][\w.:@]+)/?[\w.?=%&=\-@/$,&amp;+]*'";
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Laziale
  • 7,965
  • 46
  • 146
  • 262

2 Answers2

1

add (?=www) for only urls that have www

@"(?<Protocol>\w+)://(?=www)(?<Domain>[\w@][\w.:@]+)/?[\w.?=%&=\-@/$,&amp;+]*"

or add (?!www) for no www urls

@"(?<Protocol>\w+)://(?!www)(?<Domain>[\w@][\w.:@]+)/?[\w.?=%&=\-@/$,&amp;+]*"
Peter
  • 91
  • 1
  • 3
0

One like you have, but without the part of the regex that looks like www\.

chaos
  • 122,029
  • 33
  • 303
  • 309