2

Language = Visual Basic. I have a project that use .Net framework 4

I have this code for Regex:

Private Shared RegPattern As New Regex("\<base+.+?href\s*\=\s*(""(?<HREF>[^""]*)""|'(?<HREF>[^']*)')(\s*\w*\s*\=\s*(""[^""]*""|'[^']*')|[^>])*(\/>|>\<\/base\>)", RegexOptions.IgnoreCase Or RegexOptions.Singleline)

I have this function to get links from a html page:

Private Sub GetAdress(ByVal HtmlPage As String)
            Base = ""
            Dim Matches As System.Text.RegularExpressions.MatchCollection = RegPattern.Matches(HtmlPage)

            For Each V_Found As System.Text.RegularExpressions.Match In Matches
                Base = V_Found.Groups("HREF").Value           
End Sub

The function works fine but in some cases enter in a infinite loop. The debugger says "Evaluation Time out" at the line:

Dim Matches As System.Text.RegularExpressions.MatchCollection = RegPattern.Matches(HtmlPage)

and the exe not continue or exit or catch exceptions. How can i handle this problem? How can i exit from GetAddress method? I know there is timeoutexception but in net 4 i can't use it.

MarioProject
  • 417
  • 4
  • 25
  • It is a common thing when you [parse HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags), just use HtmlAgilityPack (see [How do you parse an HTML in vb.net](http://stackoverflow.com/questions/516811/how-do-you-parse-an-html-in-vb-net)) or another HTML parser to get the values you need in a safe way. – Wiktor Stribiżew Feb 28 '17 at 08:46
  • There is no way to exit the evaluation? – MarioProject Feb 28 '17 at 08:58
  • 2
    Replace `+.+?` with `\b[^<]+?` and `\w*\s*` with `(?:\w+\s+)?` – Wiktor Stribiżew Feb 28 '17 at 09:02
  • I replaced +.+? with \b[^<]+? and it worked! Thank you. – MarioProject Feb 28 '17 at 09:48

1 Answers1

0

If you are wanting to keep the code, but catch the exception so it does nothing, try a Try...Catch.

Try
    Dim Matches As System.Text.RegularExpressions.MatchCollection = RegPattern.Matches(HtmlPage)
    Base = ""

    For Each V_Found As System.Text.RegularExpressions.Match In Matches Base = V_Found.Groups("HREF").Value
Catch TimeOutException
End Try

Since it looks like you are just trying to parse links, you could try something like:

Dim htmlBrowser As WebBrowser = 'Browser with HtmlPage'
Dim linkCollection As HtmlElementCollection = htmlBrowser.Document.GetElementsByTagname("a") 'Or another tag name

For Each elems As HtmlElement In linkCollection
    Base = ""
    Dim Matches As System.Text.RegularExpressions.MatchCollection = RegPattern.Matches(HtmlPage)

    For Each V_Found As System.Text.RegularExpressions.Match In Matches Base = V_Found.Groups("HREF").Value
        'Code to run'
    Next
Next
Tytus Strube
  • 27
  • 1
  • 7