2

VB6 program. I have a UTF-8 encoded file (not created by me) that I read values from. I use FileSystemObject.ReadLine() to read the file. If I read that into a String or Variant data type, and look at the value in the debugger, it is shown in ANSI with 2 ugly characters where the UTF-8 Spanish "n" is. I can write that very string back out using FSO.WriteLine() and when I open the file in NotePad++, it recognizes it is UTF-8 encoding and correctly shows that string's character. If I put that value in a TextBox, again, it has the ugly Ansi characters where the UTF-8 "n" is supposed to be.

If I read that same value by ID from my MSAccess database with UTF-8 encoding, put it in a String data type, it displays correctly as UTF-8 in the debugger, and if I then assign that to a TextBox.Text, it shows with the UTF-8 encoding in the TextBox.

So the problem appears to be what is getting assigned to the String data type and how that String recognizes the encoding of the data that was just handed to it.

What am I missing? Why does the String variable recognize the UTF-8 encoding when the data is assigned to it from a DAO recordset object but not when read from a UTF-8 encoded file with the same value. If I open that file in NotePadd++, it seems to know and display the characters correctly.

Thanks much for any assistance.

JJJones_3860
  • 1,382
  • 2
  • 15
  • 35
  • NotePad++ is "the ruin of many a poor boy" because it can thinly mask many ills and just cause more confusion when the veil is lifted. String variables do not "recognize" anything. They normally hold UTF-16LE characters though it is possible to stuff arbitrary bytes into them too, but it is almost certain that whatever you are doing via DAO is performing transcoding for you. The FSO only supports UTF-16LE and ANSI text I/O. – Bob77 Nov 13 '15 at 02:06
  • You might consider reading through this http://www.joelonsoftware.com/articles/Unicode.html – Bob77 Nov 13 '15 at 02:08
  • same question solved here http://stackoverflow.com/questions/29980993/how-to-decode-utf8-in-vb6 – milevyo Nov 13 '15 at 10:51
  • Thanks for that link Bob77! I needed that. – JJJones_3860 Nov 16 '15 at 18:15

1 Answers1

2

Thanks for the assistance all. The issue is that FileSystemObject cannot read UTF-8 files. It is answered in another post here: Read utf-8 text file in vbscript

I was unaware of that point and really my understanding of encoding overall was quite weak. A bit better understanding now.

The solution offered above was to use ADODB.Stream object to read utf-8 files.

But, I want the CSV file imported into my Access database. After hours of searching, here is the code that does it.

db.Execute "Select * Into Test1 From [Text;CharacterSet=65001;FMT=CSVDelimited;HDR=YES;DATABASE=C:\Test\].[utf8-test.csv]"

Hope this helps others.

Community
  • 1
  • 1
JJJones_3860
  • 1,382
  • 2
  • 15
  • 35