iam trying to crawl some sites. It works like a charm. But there is a major problem. On some pages (not mutch) I'm getting some weird characters instead of html code.
It looks like this:
;�<cS���u�/�qYa$�4l7�.�Q�7&��O����� Z�D}z��/���� ��u����V���lWY|�n5�1�We����GB�U��g{�� �|Ϸ����*�Q��0���nb�o�߯�����[b��/����@CƑ����D{{/n��X�!� �Et�X"����?��˩����8\y��&
If I'll open it in my browser, there is no Problem at all. I dont understand why.
My HTTP Header says:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Accept-Encoding:gzip,deflate,sdch Accept-Language:de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4 Cache-Control:max-age=0 Connection:keep-alive User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36
I think it has something to do with the Accept
request.Accept = "*/*"
Thats my webrequest:
Public Class Http
Dim cookieCon As New CookieContainer
Dim request As HttpWebRequest
Dim response As HttpWebResponse
Public Function GetRequest(ByVal Params() As Object)
Dim url As String = Params(0)
Dim mycookie As String = Params(1)
'request.AllowAutoRedirect = True
request = CType(HttpWebRequest.Create(url), HttpWebRequest)
request.CookieContainer = New CookieContainer()
request.Method = "GET"
request.Timeout = 20000
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
'request.ContentType = "application/x-www-form-urlencoded"
request.Accept = "*/*"
If Not mycookie Like "nocookie" Then
request.Headers("Cookie") = mycookie
End If
response = CType(request.GetResponse(), HttpWebResponse)
Dim html(1) As String
html(0) = request.Address.ToString()
html(1) = New StreamReader(response.GetResponseStream()).ReadToEnd()
Return html
End Function
Thanks.