9

I am using Microsoft.XMLHTTP via VBA to pull in the body of a web page. In doing so, characters such as é get replaced with "?" or something equally not useful.

Here's the basic code:

Set objHTTP = CreateObject("Microsoft.XMLHTTP")

objHTTP.Open "GET", ThisWebPage, False
objHTTP.setRequestHeader "Content-Type", _
      "application/x-www-form-urlencoded; charset=UTF-8"
objHTTP.Send ("")

strResponse = objHTTP.responseText

Is there any way to retrieve the page with the special characters intact?

Note: I have also tried using this request header with no success:
objHTTP.setRequestHeader "Content-Type", "content=text/html; charset=iso-8859-1"

Thanks in advance.

Solution
Thanks to Ben.Vineyard (and some cursory Googling), I'm able to pull accented characters with the following code:

 ' Create the XMLHTTP object
  Set objHTTP = CreateObject("Microsoft.XMLHTTP")

 ' Send the request
 objHTTP.Open "GET", WhatWebPage, False
 objHTTP.Send ("")

 Dim BinaryStream
 Set BinaryStream = CreateObject("ADODB.Stream")

 With BinaryStream
    .Type = adTypeBinary
    .Open
    .Write objHTTP.ResponseBody

    'Change stream type To binary
    .Position = 0
    .Type = adTypeText

    'Specify charset For the source text (unicode) data.
    .Charset = "iso-8859-1"

    'Open the stream And get binary data from the object
    strResponse = .ReadText
End With
variant
  • 1,344
  • 2
  • 11
  • 18
  • If you have (eg) Fiddler, try taking a look at the *response* header(s) and see what's there. Fiddler will also show you the response, so see how that compares to the value from xmlhttp. – Tim Williams Aug 17 '11 at 23:01
  • @variant: Can you please take a look in my code, and see if you can help me with it?[link](https://stackoverflow.com/questions/23786031/vba-convert-string-to-unicode) – Trenera May 21 '14 at 15:47

1 Answers1

4

The problem could be that you do not actually send the data encoded as utf-8. It might be in Ansi or whatever string/file encoding you use. And then it will not be able to use characters high than 127 in the ASCII code. Are you sure that the original text stream is utf-8? Have you tried other encoding like one of the iso-* formats?

Ben.Vineyard
  • 1,149
  • 8
  • 14
  • Thanks, Ben. I've also tried objHTTP.setRequestHeader "Content-Type", "content=text/html; charset=iso-8859-1", which matches the header of the page, with no success. – variant Aug 17 '11 at 22:26
  • Are you seeing this translation of characters in VBA or another system you are possibly sending to? – Ben.Vineyard Aug 17 '11 at 22:37
  • I'm seeing these special characters being converted to meaningless junk in VBA when I examine the value of responseText – variant Aug 17 '11 at 22:40
  • You could try treating the text as a binary stream: objHTTP.Send With CreateObject("ADODB.Stream"). – Ben.Vineyard Aug 17 '11 at 22:45
  • Ben - can you update your answer with the ADODB.Stream answer - that solved it for me. Thanks!! – variant Aug 18 '11 at 13:46
  • Sweet! Im glad you were able to solve it. Should I just pull your complete solution into my answer or leave as is? I think it is fine. – Ben.Vineyard Aug 18 '11 at 18:33