67

how can I write UTF-8 encoded strings to a textfile from vba, like

Dim fnum As Integer
fnum = FreeFile
Open "myfile.txt" For Output As fnum
Print #fnum, "special characters: äöüß" 'latin-1 or something by default
Close fnum

Is there some setting on Application level?

Karsten W.
  • 17,826
  • 11
  • 69
  • 103

7 Answers7

98

I found the answer on the web:

Dim fsT As Object
Set fsT = CreateObject("ADODB.Stream")
fsT.Type = 2 'Specify stream type - we want To save text/string data.
fsT.Charset = "utf-8" 'Specify charset For the source text data.
fsT.Open 'Open the stream And write binary data To the object
fsT.WriteText "special characters: äöüß"
fsT.SaveToFile sFileName, 2 'Save binary data To disk

Certainly not as I expected...

jobrad
  • 1,801
  • 2
  • 11
  • 7
Karsten W.
  • 17,826
  • 11
  • 69
  • 103
  • hello, what if i wanted to save in utf-16, i just have to change 8 to 16 right? – Smith Jan 07 '11 at 13:06
  • 33
    i don't know, did you try it? – Karsten W. Jan 09 '11 at 18:51
  • Tried your code. Content of saved file: special characters: äöüß – Johnny Apr 28 '21 at 13:14
  • @Johnny It may be a issue with your text editor. Make sure that it opens the file with the correct encoding (Notepad is especially bad at this) **_____________________________________________________________________________________________** Related issue: depends on the locale you might not be able to put the special characters in a string literal like this. See [vba - How to type currency symbols in Visual Basic Editor - Stack Overflow](https://stackoverflow.com/questions/24384952/how-to-type-currency-symbols-in-visual-basic-editor?noredirect=1&lq=1) – user202729 Oct 05 '21 at 12:01
  • @Johnny what you posted is *exactly* what UTF8 would look like if you loaded it using Latin1. Each character is saved using 1 or more bytes. For characters above the US-ASCII range the first byte looks like `Ã` – Panagiotis Kanavos Jan 12 '22 at 12:28
29

You can use CreateTextFile or OpenTextFile method, both have an attribute "unicode" useful for encoding settings.

object.CreateTextFile(filename[, overwrite[, unicode]])        
object.OpenTextFile(filename[, iomode[, create[, format]]])

Example: Overwrite:

CreateTextFile:
 fileName = "filename"
 Set fso = CreateObject("Scripting.FileSystemObject")
 Set out = fso.CreateTextFile(fileName, True, True)
 out.WriteLine ("Hello world!")
 ...
 out.close

Example: Append:

 OpenTextFile Set fso = CreateObject("Scripting.FileSystemObject")
 Set out = fso.OpenTextFile("filename", ForAppending, True, 1)
 out.Write "Hello world!"
 ...
 out.Close

See more on MSDN docs

user202729
  • 3,358
  • 3
  • 25
  • 36
  • Interesting. Object is of class `FileSystemObject`, right? How would I write to this file? `.Write`? – Karsten W. Jul 13 '12 at 13:52
  • 1
    If you want to work with MAC and Windows, search for [Boost FileSystem for C++](http://www.boost.org/doc/libs/1_53_0/libs/filesystem/doc/reference.html) or another libraries. – danieltakeshi Oct 30 '17 at 18:47
  • 1
    After trying several settings, when i use : fso.CreateTextFile(fileName, True, True) the format is UTF-16 LE, when i change the code to ; fso.CreateTextFile(fileName, True, False) the format is UTF-8 – Mathias Z Jun 16 '20 at 07:24
10

This writes a Byte Order Mark at the start of the file, which is unnecessary in a UTF-8 file and some applications (in my case, SAP) don't like it. Solution here: Can I export excel data with UTF-8 without BOM?

Community
  • 1
  • 1
PhilHibbs
  • 859
  • 1
  • 13
  • 30
  • 1
    This is the best solution for standard UTF-8. Android Studio could not correctly read the .xml files I created using other methods. – Mahmut K. Apr 22 '23 at 17:06
9

Here is another way to do this - using the API function WideCharToMultiByte:

Option Explicit

Private Declare Function WideCharToMultiByte Lib "kernel32.dll" ( _
  ByVal CodePage As Long, _
  ByVal dwFlags As Long, _
  ByVal lpWideCharStr As Long, _
  ByVal cchWideChar As Long, _
  ByVal lpMultiByteStr As Long, _
  ByVal cbMultiByte As Long, _
  ByVal lpDefaultChar As Long, _
  ByVal lpUsedDefaultChar As Long) As Long

Private Sub getUtf8(ByRef s As String, ByRef b() As Byte)
Const CP_UTF8 As Long = 65001
Dim len_s As Long
Dim ptr_s As Long
Dim size As Long
  Erase b
  len_s = Len(s)
  If len_s = 0 Then _
    Err.Raise 30030, , "Len(WideChars) = 0"
  ptr_s = StrPtr(s)
  size = WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, 0, 0, 0, 0)
  If size = 0 Then _
    Err.Raise 30030, , "WideCharToMultiByte() = 0"
  ReDim b(0 To size - 1)
  If WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, VarPtr(b(0)), size, 0, 0) = 0 Then _
    Err.Raise 30030, , "WideCharToMultiByte(" & Format$(size) & ") = 0"
End Sub

Public Sub writeUtf()
Dim file As Integer
Dim s As String
Dim b() As Byte
  s = "äöüßµ@€|~{}[]²³\ .." & _
    " OMEGA" & ChrW$(937) & ", SIGMA" & ChrW$(931) & _
    ", alpha" & ChrW$(945) & ", beta" & ChrW$(946) & ", pi" & ChrW$(960) & vbCrLf
  file = FreeFile
  Open "C:\Temp\TestUtf8.txt" For Binary Access Write Lock Read Write As #file
  getUtf8 s, b
  Put #file, , b
  Close #file
End Sub
Falo
  • 331
  • 3
  • 5
3

I looked into the answer from Máťa whose name hints at encoding qualifications and experience. The VBA docs say CreateTextFile(filename, [overwrite [, unicode]]) creates a file "as a Unicode or ASCII file. The value is True if the file is created as a Unicode file; False if it's created as an ASCII file. If omitted, an ASCII file is assumed." It's fine that a file stores unicode characters, but in what encoding? Unencoded unicode can't be represented in a file.

The VBA doc page for OpenTextFile(filename[, iomode[, create[, format]]]) offers a third option for the format:

  • TriStateDefault 2 "opens the file using the system default."
  • TriStateTrue 1 "opens the file as Unicode."
  • TriStateFalse 0 "opens the file as ASCII."

Máťa passes -1 for this argument.

Judging from VB.NET documentation (not VBA but I think reflects realities about how underlying Windows OS represents unicode strings and echoes up into MS Office, I don't know) the system default is an encoding using 1 byte/unicode character using an ANSI code page for the locale. UnicodeEncoding is UTF-16. The docs also describe UTF-8 is also a "Unicode encoding," which makes sense to me. But I don't yet know how to specify UTF-8 for VBA output nor be confident that the data I write to disk with the OpenTextFile(,,,1) is UTF-16 encoded. Tamalek's post is helpful.

Community
  • 1
  • 1
Bennett Brown
  • 5,234
  • 1
  • 27
  • 35
2

I didn't want to change all my code just to support several UTF8 strings so i let my code do it's thing, and after the file was saved (in ANSI code as it is the default of excel) i then convert the file to UTF-8 using this code:

Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
    Dim objFS  As Object
    Dim iFile       As Double
    Dim sFileData   As String
    
    'Init
    iFile = FreeFile
    Open sInFilePath For Input As #iFile
        sFileData = Input$(LOF(iFile), iFile)
        sFileData = sFileData & vbCrLf
    Close iFile
    
    'Open & Write
    Set objFS = CreateObject("ADODB.Stream")
    objFS.Charset = "utf-8"
    objFS.Open
    objFS.WriteText sFileData
    
    'Save & Close
    objFS.SaveToFile sOutFilePath, 2   '2: Create Or Update
    objFS.Close
    
    'Completed
    Application.StatusBar = "Completed"
End Sub

and i use this sub like this (this is an example):

Call convertTxttoUTF("c:\my.json", "c:\my-UTF8.json")

i found this code here: VBA to Change File Encoding ANSI to UTF8 – Text to Unicode

and since this is written with BOM marker, in order to remove the bom i changed the Sub to this:

Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
    Dim objStreamUTF8  As Object
    Dim objStreamUTF8NoBOM  As Object
    Dim iFile       As Double
    Dim sFileData   As String
    
    Const adSaveCreateOverWrite = 2
    Const adTypeBinary = 1
    Const adTypeText = 2
    
    'Init
    iFile = FreeFile
    Open sInFilePath For Input As #iFile
        sFileData = Input(LOF(iFile), iFile)
    Close iFile
    
    'Open files
    Set objStreamUTF8 = CreateObject("ADODB.Stream")
    Set objStreamUTF8NoBOM = CreateObject("ADODB.Stream")
           
    ' wrute the fules       
    With objStreamUTF8
      .Charset = "UTF-8"
      .Open
      .WriteText sFileData
      .Position = 0
      .SaveToFile sOutFilePath, adSaveCreateOverWrite
      .Type = adTypeText
      .Position = 3
    End With
    
    With objStreamUTF8NoBOM
      .Type = adTypeBinary
      .Open
      objStreamUTF8.CopyTo objStreamUTF8NoBOM
      .SaveToFile sOutFilePath, 2
    End With
    
    ' close the files
    objStreamUTF8.Close
    objStreamUTF8NoBOM.Close
End Sub

i used this answer to solve the BOM unknown character at the beginning of the file

Shaybc
  • 2,628
  • 29
  • 43
0

The traditional way to transform a string to a UTF-8 string is as follows:

StrConv("hello world",vbFromUnicode)

So put simply:

Dim fnum As Integer
fnum = FreeFile
Open "myfile.txt" For Output As fnum
Print #fnum, StrConv("special characters: äöüß", vbFromUnicode)
Close fnum

No special COM objects required

Sancarn
  • 2,575
  • 20
  • 45
  • 2
    `StrConv(vbFromUnicode)` returns a *byte array* that contains the result of converting the given Unicode string (all strings are Unicode in VBA) into the current system codepage for non-Unicode programs. The "special characters" from the string that were not on that codepage are lost at that point (not that you could have them in the original literal in the first place, because the VBA code editor is [not Unicode](https://stackoverflow.com/a/25260658/11683)). – GSerg Nov 16 '21 at 18:54
  • 2
    This garbage byte array is then passed to `Print` that is designed for string and thus thinks that the passed data is a regular Unicode string, so it converts it ["from Unicode" *again*](https://stackoverflow.com/a/23980044/11683), thus removing half of the characters from it. The resulting decimated garbage appears in the file. The code shown above creates a text file that is 14 bytes long, given that the original string literal contains 24 characters. In all of the above, "Unicode" means "UTF-16". UTF-8 does not enter the scene in any shape or form. – GSerg Nov 16 '21 at 18:57
  • @GSerg `TypeName(StrConv("hello world",vbFromUnicode))` returns `String` in my version of Excel. Not sure why it's returning `Byte()` for you? Additionally in my version of Excel the result of `StrConv(...,vbFromUnicode)` is half the number of bytes of the string itself I.E. It is performing a `UTF-16` --> `UTF-8` conversion. This can be confirmed by setting the result to a byte array `Dim b() as byte: b = strconv(...,vbFromUnicode)` – Sancarn Nov 16 '21 at 19:53
  • @GSerg you can also see that the StrConv is meant to return a string [from the docs](https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/strconv-function) – Sancarn Nov 16 '21 at 19:59
  • 3
    It is not performing UTF-16 to UTF-8 conversion, it is performing UTF-16 to ASCII conversion, using your current global codepage. UTF-8 is not a synonym to "encoding that uses one byte per character", as 1) there are hundreds of different encodings that use one byte per character, and 2) UTF-8 uses [up to 4 bytes per character](https://en.wikipedia.org/wiki/UTF-8#Encoding), depending on the character. The ability to assign the string bytes to a byte array is also not an inherent property of UTF-8 and thus is not a sign of UTF-8 being used. – GSerg Nov 16 '21 at 21:34
  • The `StrConv` documentation is indeed poor, but you disprove it yourself by assigning its result to a byte array. The [Remarks section](https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/strconv-function#remarks) is more on point. Please see https://www.joelonsoftware.com/articles/Unicode.html otherwise. You can also run the code in your answer, as is, and observe that it's not possible to recover the string `"special characters: äöüß"` from the resulting file. – GSerg Nov 16 '21 at 21:35
  • @GSerg I thought UTF-8 == ASCII. Thanks for the clarification that my initial thoughts were wrong. – Sancarn Nov 17 '21 at 12:29