0

Good day, I'm currently building a scraper and have a few questions. I've already built in threading etc. so that everything works faster in the code, but everything runs too slowly for me.

Public Sub ScrapeProxyDo(address As String)
    Dim wc As New Net.WebClient
    Dim matchCollection As MatchCollection
    Try
        Dim input As String = wc.DownloadString(address)
        matchCollection = REGEX.Matches(input)
'ncihts
        For Each obj As Object In matchCollection
            Dim match As Match = CType(obj, Match)
            Dim item As String = match.ToString()
            RichTextBox2.AppendText(item & Environment.NewLine)
        Next
    Catch ex As Exception
'Nichts
    End Try
End Sub

The code is relatively simple, it checks whether the page contains an IP with a port, the regex for it is: Dim REGEX As Regex = New Regex("\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\:[0-9]{1,5}\b")

But now he takes from the string what is downloaded, each proxy individually and of course that takes time, can you maybe change it somehow so that he filters out all proxies and inserts them directly into the RichTextBox? That would be much faster than if he worked his way slowly from bottom to top.

Regards

Callum Watkins
  • 2,844
  • 4
  • 29
  • 49
Kashed
  • 1
  • 1
    [Please don't swallow exceptions!](https://stackoverflow.com/q/2416316/87698) – Heinzi Jan 18 '21 at 15:00
  • 1
    About your question: Create a StringBuilder, add the results to your StringBuilder instead of the RichTextBox, and then, in the end, copy the contents of the StringBuilder to your RichTextBox. That way, you only have *one* slow UI update instead of *a lot*. – Heinzi Jan 18 '21 at 15:02
  • How are you accessing `RichTextBox2` from what appears to be code run in a worker Thread? Did you set `CheckForIllegalCrossThreadCalls = false` *somewhere*? If that's the case, remove ASAP. Note that WebClient has async methods available, as [DownloadStringTaskAsync()](https://learn.microsoft.com/en-us/dotnet/api/system.net.webclient.downloadstringtaskasync) that you can await. You don't need any Thread. `A List(Of Task)`, maybe. – Jimi Jan 18 '21 at 15:12

1 Answers1

1

As suggested by @Heinzi in Comments, use a StringBuilder. Strings are immutable, StringBuilder can be changed. The StringBuilder keeps us from throwing away a String and creating a new one on each iteration. Use the .Value property of the Match. Don't use As Object unless you absolutely must.

Public Sub ScrapeProxyDo(address As String)
    Dim REGEX As Regex = New Regex("\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\:[0-9]{1,5}\b")
    Dim wc As New Net.WebClient
    Dim input As String = wc.DownloadString(address)
    Dim matchCollection = REGEX.Matches(input)
    Dim sb As New StringBuilder
    For Each obj As Match In matchCollection
        sb.AppendLine(obj.Value)
    Next
    'Assuming this is on the UI thread
    RichTextBox2.Text = sb.ToString
End Sub
Mary
  • 14,926
  • 3
  • 18
  • 27