0

I would like to extract some date from a webpage with iso-8859-1 characters with a VBA short Sub. I almost got the expected result. The problem is with some specific characters, for examle : é, è, ç, à ... Instead of having them right, I have a black box with a question mark for this character and the 2 following characters. What would you suggest ?

Here is how I got there :

my_url = "http://www.website.fr/"
Set html_doc = CreateObject("htmlfile")
Set xml_obj = CreateObject("MSXML2.XMLHTTP")
xml_obj.Open "GET", my_url, False
xml_obj.send
html_doc.body.innerhtml = xml_obj.responseText
Set xml_obj = Nothing
Sheets(1).Cells(1, 2).Value = html_doc.body.getElementsByTagName("div")(3).innertext
arnaud1000
  • 15
  • 8
  • https://stackoverflow.com/questions/2524703/save-text-file-utf-8-encoded-with-vba should be the right direction – Uke Sep 19 '17 at 13:33
  • @Zoba : My computer does not like "Set fst = CreateObject("ADODB.Stream")"... Object Required – arnaud1000 Sep 19 '17 at 13:44
  • What exactly do you mean by an "ANSI character"? I'm aware that this term was used by Microsoft (mid 1980s?) when they first moved beyond 7-bit character sets, and that it has nothing to do with the American National Standards Institute, but I wasn't aware the term was still in use, and if it is used, I have no idea what it means in today's Unicode world. – Michael Kay Sep 19 '17 at 16:06
  • I'm not sure ANSI is correct in fact. I just checked the web page info, it is rather iso-8859-1. – arnaud1000 Sep 19 '17 at 17:45
  • https://stackoverflow.com/questions/7100229/xmlhttp-and-special-characters-eg-accents looks like the problem I have. But I still have a basic issue with ADODB despite activating it. – arnaud1000 Sep 21 '17 at 07:37
  • I solved it with ADODB so as to keep special characters and then working with regexp. – arnaud1000 Sep 21 '17 at 12:50

0 Answers0