0

i am using this code, but it only looks for URLs with " "

Dim html As String = txtSource.Text
Dim mc As MatchCollection = Regex.Matches(html, """(http://.+?)""", RegexOptions.IgnoreCase)
For Each m As Match In mc
lstReapedLinks.Items.Add(m.Groups(1).Value)
Next
Junaid Rehman
  • 169
  • 4
  • 11
  • removing the double quotes at the begging and at the end of the regex pattern seems like the way to go – Andrew Savinykh Jun 06 '13 at 21:55
  • no its not.. it just get the single letter after //. for http://yahoo.com, it returns http://y – Junaid Rehman Jun 06 '13 at 22:00
  • 1
    Yep, that's probably because you are using lazy quantifier instead of greedy one. Try this: `Regex.Matches(html, "(http://.+)", RegexOptions.IgnoreCase)` – Andrew Savinykh Jun 06 '13 at 22:02
  • Also it would help if you specify what kind of source you are parsing and what exactly are you trying to match in this source. It looks like your might have several urls in the source text and you want to capture them all, but you are not specifying how individual urls are separated, etc. One needs to know that type of things to create a proper regular expression. – Andrew Savinykh Jun 06 '13 at 22:12
  • it did not extract anything from: Next>> – Junaid Rehman Jun 06 '13 at 22:13
  • well i am extracting all the http link from a webpage – Junaid Rehman Jun 06 '13 at 22:14
  • Ah, so you are trying to parse HTML with Regex. You might want to have a look here http://stackoverflow.com/a/1732454/284111 – Andrew Savinykh Jun 07 '13 at 00:29

1 Answers1

1

If you expect to have multiple URL's in your string then you need to define what will be their separator, for example, a blank space some text http://abc http://123 nonurltext or like it seems to be the case based on your regex some text(http://abc) some other text (http://123) some more text, once you have this delimeter, then you can use it to tell regex how to zero in on the string you really want. The following will get http://... if it's enclosed in parenthesis, like (http://www.yahoo.com) ignoring everything else

Regex.Matches(test, "(?<=\()http://.+?(?=\))", RegexOptions.IgnoreCase)

You should be able to just change this to suit your needs, for example if your delimeter were blank spaces, then just replace \( and \) with \s (means blank space)

Jason
  • 3,844
  • 1
  • 21
  • 40