0

enter image description here

Allegedly there are illegal characters where there are none. My instance is going nuts on me. The problem occurs at Dim datastream As Stream = client.OpenRead(url).

First Dim url As String = GoogleSearch & MovieName did not want to accept HTML format as string. Ok. I remove https:// from the string and now it is just 'www.____` format which should still work with the webclient. Now it pulls this on me. Why? When tested outside Visual Studio it works.

My input string URL is: www.google.com/search?q=imdb+Orville which causes the webclient to give this error:

System.ArgumentException: 'Illegal characters in path.'

Public Class Form1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load()

    End Sub

    Public Property status As Boolean
    Public Property Id As String
    Public Property ImdbURL As String

    Private GoogleSearch As String = "www.google.com/search?q=imdb+"
    Private BingSearch As String = "www.bing.com/search?q=imdb+"
    Private AskSearch As String = "www.ask.com/web?q=imdb+"

    Private Function match(ByVal regex As String, ByVal html As String, ByVal Optional i As Integer = 1) As String
        Return New Regex(regex, RegexOptions.Multiline).Match(html).Groups(i).Value.Trim()
    End Function

    Private Function matchAll(ByVal regex As String, ByVal html As String, ByVal Optional i As Integer = 1) As ArrayList
        Dim list As ArrayList = New ArrayList()

        For Each m As Match In New Regex(regex, RegexOptions.Multiline).Matches(html)
            list.Add(m.Groups(i).Value.Trim())
        Next

        Return list

    End Function

    Private Function getIMDbUrl(ByVal MovieName As String, ByVal Optional searchEngine As String = "google") As String
        Dim url As String = GoogleSearch & MovieName
        If searchEngine.ToLower().Equals("bing") Then url = BingSearch & MovieName
        If searchEngine.ToLower().Equals("ask") Then url = AskSearch & MovieName
        Dim html As String = getUrlData(url)
        Dim imdbUrls As ArrayList = matchAll("<a href=""(http://www.imdb.com/title/tt\d{7}/)"".*?>.*?</a>", html)

        If imdbUrls.Count > 0 Then
            Return CStr(imdbUrls(0))
        ElseIf searchEngine.ToLower().Equals("google") Then
            Return getIMDbUrl(MovieName, "bing")
        ElseIf searchEngine.ToLower().Equals("bing") Then
            Return getIMDbUrl(MovieName, "ask")
        Else
            Return String.Empty
        End If

    End Function

    Private Function getUrlData(ByVal url As String) As String
        Dim client As WebClient = New WebClient()
        Dim r As Random = New Random()
        client.Headers("X-Forwarded-For") = r.[Next](0, 255) & "." & r.[Next](0, 255) & "." & r.[Next](0, 255) & "." & r.[Next](0, 255)
        client.Headers("User-Agent") = "Mozilla/" & r.[Next](3, 5) & ".0 (Windows NT " & r.[Next](3, 5) & "." & r.[Next](0, 2) & "; rv:2.0.1) Gecko/20100101 Firefox/" & r.[Next](3, 5) & "." & r.[Next](0, 5) & "." & r.[Next](0, 5)
        Dim datastream As Stream = client.OpenRead(url)
        Dim reader As StreamReader = New StreamReader(datastream)
        Dim sb As StringBuilder = New StringBuilder()

        While Not reader.EndOfStream
            sb.Append(reader.ReadLine())
        End While

        Return sb.ToString()

    End Function

    Private Sub parseIMDbPage(ByVal imdbUrl As String)
        Dim html As String = getUrlData(imdbUrl)
        Id = match("<link rel=""canonical"" href=""http://www.imdb.com/title/(tt\d{7})/"" />", html)

        If Not String.IsNullOrEmpty(Id) Then
            status = True
            imdbUrl = "http://www.imdb.com/title/" & Id & "/"
        End If

    End Sub

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim TextFromBox As String = RichTextBox1.Text
        Dim imdbUrl As String = getIMDbUrl(TextFromBox)
        parseIMDbPage(imdbUrl)
        MessageBox.Show(Id)

    End Sub

End Class
Andrew Morton
  • 24,203
  • 9
  • 60
  • 84
Sci00213
  • 23
  • 6
  • Are you sure you copied that error right? Your title has an error string, but your question shows what looks like another code snippet, not an error message. – Zachary Craig May 31 '19 at 13:59
  • Ok. Added picture. – Sci00213 May 31 '19 at 14:02
  • I executed getUrlData("http" & "s://www.google.com/search?q=imdb+Orville") and it worked well. What happen if you execute this exact line? (I concatenated the string just for this comment, you don't need it. SO would format it as a url) – the_lotus May 31 '19 at 14:08
  • Some good info here https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid – Mary May 31 '19 at 14:11
  • Same thing. Should i reinstall Visual Studio? Maybe updating my windows deleted something. It was done using usb installer not the windows update. – Sci00213 May 31 '19 at 14:11
  • So, to be clear, you made sure to re-add the protocol to the URLs and it still errors out? – Zachary Craig May 31 '19 at 14:14
  • re-add the protocols is where i lost you. I added the string manually (full name) to the code (not from the TextBox) and it errored out again. – Sci00213 May 31 '19 at 14:15
  • i tried adding it as `"http://www.google.com/search?q=imdb+orville"`. It worked with http. But if i add this then my code breaks at defining url `Dim url As String = GoogleSearch & MovieName`. Url is empty because of the http:// format. – Sci00213 May 31 '19 at 14:20
  • am updating the whole visual studio atm. Maybe its because of the new Net framework 4.8. – Sci00213 May 31 '19 at 14:30
  • Perhaps some invisible characters have crept into the strings. If you use `imdbUrl = Regex.Replace(imdbUrl, "[^A-Za-z0-9:/?%+&=.]", "")` before using it, does it work? (I may have missed some characters out.) – Andrew Morton May 31 '19 at 14:37
  • Instead of removing characters, you could just [encode them](https://learn.microsoft.com/en-us/dotnet/api/system.web.httpserverutility.urlencode?view=netframework-4.8). – the_lotus May 31 '19 at 15:11
  • Nop. My bad. Nothing changed. Tried using regex. Nothing. There are no invis characters. Will try encode. – Sci00213 May 31 '19 at 15:22
  • @the_lotus I was thinking that maybe something like a zero-width joiner had got into the strings in the source code. Apparently not, but that sort of problem has been found on SO before. – Andrew Morton May 31 '19 at 15:31
  • "First Dim url As String = GoogleSearch & MovieName did not want to accept HTML format as string. Ok. I remove https:// from the string" It only works for me (using your exact function) when I add `http://` or `https://` at the beginning or the url. What error were you getting when this was at the beginning of your string? – Idle_Mind May 31 '19 at 15:34
  • I removed http:// and then added it outside the format in front of the string as `Dim url As String = "http://" & GoogleSearch & MovieName If searchEngine.ToLower().Equals("bing") Then url = "http://" & BingSearch & MovieName If searchEngine.ToLower().Equals("ask") Then url = "http://" & AskSearch & MovieName` Still not defining url – Sci00213 May 31 '19 at 15:42
  • When i add `Dim urlhttp As String = "http://" & url` at the function `getUrlData` i get again url not defined and only the `"http://"` as the new url. How does a function get affected when there is nothing there to affect it? – Sci00213 May 31 '19 at 15:46
  • If i remove the `"http://"` part it gets defined as `www.google.com/search?q=imdb+Orville`. Mindblowing. – Sci00213 May 31 '19 at 15:52
  • Im an idiot. What happens is the URL is empty when the `parseImdbPage(imdbUrl)` starts it runs an empty url im an idiot fuck my life. since i am using the same string over and over again it was complicated to realise. – Sci00213 May 31 '19 at 16:04
  • Turned out its because HTTPS is forced upon HTTP! LOL! WHAT THE ****! – Sci00213 May 31 '19 at 16:10
  • So basically, the url was empty because nothing was found because of a single letter. – Sci00213 May 31 '19 at 16:13
  • Glad you figured it out. I was almost there, too. Noticed you had recursive calls and the last one was "return string.empty". I was in the process of tracing calls... – Idle_Mind May 31 '19 at 16:16

0 Answers0