3

I would like to get the UTF-8 Code of a character, have attempted to use streams but it doesn't seem to work:

Example: פ should give 16#D7A4, according to https://en.wikipedia.org/wiki/Pe_(Semitic_letter)#Character_encodings

Const adTypeBinary = 1
Dim adoStr, bytesthroughado
Set adoStr = CreateObject("Adodb.Stream")
    adoStr.Charset = "utf-8"
    adoStr.Open
    adoStr.WriteText labelString
    adoStr.Position = 0 
    adoStr.Type = adTypeBinary
    adoStr.Position = 3 
    bytesthroughado = adoStr.Read
    Msgbox(LenB(bytesthroughado)) 'gives 2
    adoStr.Close
Set adoStr = Nothing
MsgBox(bytesthroughado) ' gives K

Note: AscW gives Unicode - not UTF-8

Sarima
  • 749
  • 7
  • 21

1 Answers1

3

The bytesthroughado is a value of byte() subtype (see 1st output line) so you need to handle it in an appropriate way:

Option Explicit

Dim ss, xx, ii, jj, char, labelString

labelString = "ařЖפ€"
ss = ""
For ii=1 To Len( labelString)
  char = Mid( labelString, ii, 1)
  xx = BytesThroughAdo( char)
  If ss = "" Then ss = VarType(xx) & " " & TypeName( xx) & vbNewLine
  ss = ss & char & vbTab
  For jj=1 To LenB( xx)
      ss = ss & Hex( AscB( MidB( xx, jj, 1))) & " "
  Next
  ss = ss & vbNewLine
Next   

Wscript.Echo ss

Function BytesThroughAdo( labelChar)
    Const adTypeBinary = 1  'Indicates binary data.
    Const adTypeText   = 2  'Default. Indicates text data.
    Dim adoStream
    Set adoStream = CreateObject( "Adodb.Stream")
    adoStream.Charset = "utf-8"
    adoStream.Open
    adoStream.WriteText labelChar
    adoStream.Position = 0 
    adoStream.Type = adTypeBinary
    adoStream.Position = 3 
    BytesThroughAdo = adoStream.Read
    adoStream.Close
    Set adoStream = Nothing
End Function

Output:

cscript D:\bat\SO\61368074q.vbs
8209 Byte()
a       61
ř       C5 99
Ж       D0 96
פ       D7 A4
€       E2 82 AC

I used characters ařЖפ€ to demonstrate the functionality of your UTF-8 encoder (the alts8.ps1 PowerShell script comes from another project):

alts8.ps1 "ařЖפ€"
Ch Unicode     Dec    CP    IME     UTF-8   ?  IME 0405/cs-CZ; CP852; ANSI 1250

 a  U+0061      97         …97…      0x61   a  Latin Small Letter A
 ř  U+0159     345         …89…    0xC599  Å�  Latin Small Letter R With Caron
 Ж  U+0416    1046         …22…    0xD096  Ð�  Cyrillic Capital Letter Zhe
 פ  U+05E4    1508        …228…    0xD7A4  פ  Hebrew Letter Pe
 €  U+20AC    8364        …172…  0xE282AC â�¬  Euro Sign
JosefZ
  • 28,460
  • 5
  • 44
  • 83