1

Below will not make much sense, but is part of a bigger script. The example below takes the contents of an HTML file, creates a new HTML file and overwrites the source HTML with the same contents

Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Set IE = New InternetExplorer
With IE
    .Navigate filePath
    While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
    Set HTMLdoc = .Document
End With

Set FSO = CreateObject("Scripting.FileSystemObject")
Set FileToCreate = FSO.CreateTextFile(filePath)
FileToCreate.Write HTMLdoc.DocumentElement.outerHTML

I expected the output to be similar to the source HTML but for some reason, special characters are replaced with replacement characters

Examples of replaced characters are: ë, °

WHAT I AM LOOKING FOR
Is there a way to prevent special characters to be replaced with replacement characters?

MK01111000
  • 770
  • 2
  • 9
  • 16
  • You could try to use the optional parameter _unicode_ of the [CreateTextFile-Method](https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/createtextfile-method) – Shrotter Nov 23 '22 at 14:52
  • Are this characters really replaced, or is your editor/viewer not capable to show them? – Shrotter Nov 23 '22 at 15:01
  • @Shrotter: Setting the Unicode Boolean to true, does indeed keep special characters, but seems to do more than that. formatting (css) is no longer applied and the file doubled in size. – MK01111000 Nov 23 '22 at 15:14
  • @Shrotter: The viewer is capable of showing them. Before I run the code mentioned in my question it looks fine – MK01111000 Nov 23 '22 at 15:15
  • 1
    https://stackoverflow.com/questions/2524703/save-text-file-utf-8-encoded-with-vba ? – QHarr Nov 23 '22 at 18:11

1 Answers1

0

I found 2 solutions to this problem:

SOLUTION 1
Thanks to the link provided by @QHarr I was able to solve this This solution handles the special characters upon saving.

Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Set IE = New InternetExplorer
With IE
    .Navigate filePath
    While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
    Set HTMLdoc = .Document
End With


Dim fsT As Object
Set fsT = CreateObject("ADODB.Stream")
fsT.Type = 2 'Specify stream type - we want To save text/string data.
fsT.Charset = "utf-8" 'Specify charset For the source text data.
fsT.Open 'Open the stream And write binary data To the object
fsT.WriteText HTMLdoc.DocumentElement.outerHTML
fsT.SaveToFile filePath, 2 'Save binary data To disk

Save text file UTF-8 encoded with VBA

SOLUTION 2
This solution uses MSXML2.XMLHTTP60 instead of the InternetExplorer.Application object (needs reference "Microsoft XML 6.0" in the VBA editor). This seems to handle special characters in a better way.

Dim IE As MSXML2.XMLHTTP60
Set IE = New MSXML2.XMLHTTP60
IE.Open "GET", filePath, False
IE.send
While IE.ReadyState <> 4
    DoEvents
Wend
Dim HTMLdoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.HTMLBody
Set HTMLdoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLdoc.body
HTMLBody.innerHTML = IE.responseText

Set FSO = CreateObject("Scripting.FileSystemObject")
Set FileToCreate = FSO.CreateTextFile(filePath)
FileToCreate.Write HTMLdoc.DocumentElement.outerHTML
MK01111000
  • 770
  • 2
  • 9
  • 16