I tried reading a webpage into my program using vb.net's HttpWebRequest. My problem has to do with figuring out the encoding of the web page before I actually read it into a string. When I read it in, some characters appear as diamonds with question-marks inside the diamonds. The web page itself looks fine in a browser, its just when I read it in that some characters aren't encoded right.
The original code I used was:
Dim myWebRequest As HttpWebRequest = WebRequest.Create(ourUri)
Dim myWebResponse As httpWebResponse = myWebRequest.GetResponse()
Then to try to get the encoding. I used:
Dim contentType As String = myWebResponse.ContentType
Dim charset As String = myWebResponse.CharacterSet
But both 'contentType' and 'charset' end up with the value 'text/html'. What I want to know is if the webpage is encoded in 'utf-8' or some other character set, so I can later retrieve it like this:
Dim receiveStream As Stream = myWebResponse.GetResponseStream()
Dim encode As Encoding = System.Text.Encoding.GetEncoding(charset)
Using reader As StreamReader = New StreamReader(receiveStream, encode)
So to sum up, there seems to be no way to inquire what the encoding is, and then use the answer to read the webpage the right way.
Is that true? Any help is appreciated.
The entire code of the routine (asked for by a commenter) follows:
Public Shared Function DownloadFileUsingURL(ByVal URLstring As String, ByVal descFilePathAndName As String, ByRef errorMessage As String, ByRef hadRedirect As Boolean,
ByRef newURL As String, ByRef isSecureConnection As Boolean, ByVal didSupplyfullfilename As Boolean, ByRef downloadedcontent As String,
ByRef httpTohttps As Boolean, ByRef httpsTohttp As Boolean, ByRef BytesRead As Integer) As Boolean
Dim ourUri As New Uri(URLstring)
Dim csh As New ClassSyncHttp
Dim expectSecureConnection As Boolean
Dim domainchanged As Boolean
newURL = ""
errorMessage = ""
hadRedirect = False
httpTohttps = False
httpsTohttp = False
isSecureConnection = False
If URLstring.ToLower.StartsWith("https:") Then
ServicePointManager.Expect100Continue = True
ServicePointManager.SecurityProtocol = SecurityProtocolType.SystemDefault
expectSecureConnection = True
Else
ServicePointManager.SecurityProtocol = SecurityProtocolType.SystemDefault
expectSecureConnection = False
End If
Try
Dim myWebRequest As HttpWebRequest = WebRequest.Create(ourUri) ' I changed webrequest to httpwebrequest
Dim cookieContainer As CookieContainer = New CookieContainer ' needs httpwebrequest to work
myWebRequest.CookieContainer = cookieContainer
myWebRequest.Proxy.Credentials = System.Net.CredentialCache.DefaultCredentials
myWebRequest.UserAgent = "BrowseNet"
myWebRequest.Timeout = ClassGlobalVariables.downloadTimeoutMilliseconds
myWebRequest.Credentials = CredentialCache.DefaultCredentials
Dim myWebResponse As HttpWebResponse = myWebRequest.GetResponse()
Dim contentType As String = myWebResponse.ContentType
Dim charset As String = myWebResponse.CharacterSet
If Not ourUri.Equals(myWebResponse.ResponseUri) Then
newURL = myWebResponse.ResponseUri.ToString
hadRedirect = True
If newURL.ToLower.StartsWith("https") Then
isSecureConnection = True
End If
compareURLs(URLstring, newURL, httpTohttps, domainchanged)
End If
Dim receiveStream As Stream = myWebResponse.GetResponseStream()
If didSupplyfullfilename Then
Using fs As FileStream = File.Create(descFilePathAndName)
receiveStream.CopyTo(fs)
BytesRead = fs.Length
End Using
Else
Dim encode As Encoding = System.Text.Encoding.GetEncoding(charset)
Using reader As StreamReader = New StreamReader(receiveStream, encode)
' receiveStream.Seek(0, SeekOrigin.Begin)
downloadedcontent = reader.ReadToEnd()
BytesRead = downloadedcontent.Length
End Using
End If
myWebResponse.Close()
If expectSecureConnection Then
isSecureConnection = True
Else
isSecureConnection = False
End If
Return True
Catch ex As webException
If expectSecureConnection Then
' guessing here that the problem was that was wrong about secure connection. (Problem could be elsewhere, of course)
isSecureConnection = False
httpsTohttp = True
If ex.HResult = System.Net.WebExceptionStatus.SecureChannelFailure Then
' not sure what to do
End If
'Else
' isSecureConnection = True
End If
errorMessage = ex.Message
Return False
End Try
End Function