-1

enter image description hereI have some simple VB.Net code and I am trying to get a remote web page. When I use this code to get for example msn.com the code performs flawlessly all of the time. When I try to use this code to access amazon.com the code and on every other request performs flawlessly. But every other request especially if there is some wait time, the code renders the as this:

��Yms�8���_!��` Ӓwwj�4�r�f2�$�3{U����o&@Ɏ��~�I�N���(�� ��4��o�f���~�����ty2��Eg"��G������ȟ_�w&r�C���N�(4���������_b���L*���J=?�P��j=����^�}�[n����7� AND MORE CHARACTERS JUST LIKE THIS.

It might be a timeout issue, it might be a page encoding issue with the response. I can't figure it out.

Here is the code: ( and saved on my localhost as test.aspx )

<%@ Import Namespace="System" %>
<%@ Import Namespace="System.Net" %>
<%@ Import Namespace="System.IO" %>

<script language="VB" runat="server">
Sub Page_Load(Sender as Object, E as EventArgs)
Dim q As String 
Randomize
q= (Rnd)
Dim oRequest As WebRequest = WebRequest.Create("https://www.amazon.com/dp/0132350882/?" & q)
Dim oResponse As WebResponse = oRequest.GetResponse()
Dim oStream As Stream = oResponse.GetResponseStream()
Dim oStreamReader As New StreamReader(oStream, Encoding.UTF8)
Response.Write(oStreamReader.ReadToEnd())
oResponse.Close()
oStreamReader.Close()
End Sub
</script>

Also if I load the page, the first time the encoding is wrong but if I quickly reload the page a second time the encoding is correct. If I keep reloading the page after that, and very quickly I might add, it is always correct.

Thanks in advance,

Jim

OK I modified the code above to become( and it did nothing to fix the issue ):

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
<%@ Import Namespace="System" %>
<%@ Import Namespace="System.Net" %>
<%@ Import Namespace="System.IO" %>

<script language="VB" runat="server">
Sub Page_Load(Sender as Object, E as EventArgs)
Dim q As String 
Randomize
q= (Rnd)
Dim oRequest As WebRequest = WebRequest.Create("https://www.amazon.com/dp/0132350882/?" & q)
Dim oResponse As WebResponse = oRequest.GetResponse()
Dim oStream As Stream = oResponse.GetResponseStream()
Dim oStreamReader As New StreamReader(oStream, Encoding.UTF8)
Response.Write(oStreamReader.ReadToEnd())
oResponse.Close()
oStreamReader.Close()
End Sub
</script>
</body>
</html>

But it is very strange when the remote renders correctly in the browser everything inside the is still there. However even with the updated code when it displays the wrong character set the is stripped out of the head of the document. I wonder if Amazon is doing this?

I am trying to say now and then the content inside the HEAD is being stripped out

JimF
  • 193
  • 1
  • 2
  • 7
  • Does your `Response` specify that its encoding is also UTF-8? – GSerg Aug 23 '21 at 19:12
  • here is the response---> Cache-Control: private Content-Length: 1357528 Content-Type: text/html; charset=utf-8 Date: Mon, 23 Aug 2021 21:01:32 GMT Server: Microsoft-IIS/8.5 X-AspNet-Version: 4.0.30319 X-Powered-By: ASP.NET – JimF Aug 23 '21 at 21:09
  • Your `Response` to which you `Write`. Not the amazon response. So that the browser can immediately know the encoding, instead of figuring it out later. – GSerg Aug 23 '21 at 21:12
  • OK I just tested with Opera and the problem still exists and the response headers are ---> Cache-Control: private Content-Length: 235866 Content-Type: text/html; charset=utf-8 Date: Mon, 23 Aug 2021 21:15:14 GMT Server: Microsoft-IIS/8.5 X-AspNet-Version: 4.0.30319 X-Powered-By: ASP.NET – JimF Aug 23 '21 at 21:16
  • Please see https://stackoverflow.com/q/4583201/11683. – GSerg Aug 23 '21 at 21:51

1 Answers1

-2

You have to use HttpWebRequest “simulating” a browser then read the source: The code below could be o good start.

    Randomize()
    Dim q As String = CStr((Rnd()))
    Dim request As HttpWebRequest = CType(WebRequest.Create("https://www.amazon.com/dp/0132350882/?" & q), HttpWebRequest)
    request.Method = "GET"
    request.UserAgent = "Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0"
    Using response As WebResponse = request.GetResponse
        If response IsNot Nothing Then
            Using ioStream As IO.Stream = response.GetResponseStream
                Using sr As New System.IO.StreamReader(ioStream)
                    Dim s As String = sr.ReadToEnd
                    Debug.WriteLine(s)
                End Using
            End Using
        End If
    End Using
G3nt_M3caj
  • 2,497
  • 1
  • 14
  • 16
  • Or, it’s correct a downvote BUT, explained why. In that case future users knows why this method isn’t appropriate. – G3nt_M3caj Aug 26 '21 at 07:25
  • Not the downvoters, but your use of Randomize makes me think you are still using the vb6 random methods instead of the new (well new many years ago) .net Random class which is much easier to use. – Mary Aug 27 '21 at 23:17