12

I have a exe in go which prints utf-8 encoded strings, with special characters in it.
Since that exe is made to be used from a console window, its output is mangled because Windows uses ibm850 encoding (aka code page 850).

How would you make sure the go exe print correctly encoded strings for a console windows, ie print for instance:

éèïöîôùòèìë

instead of (without any translation to the right charset)

├®├¿├»├Â├«├┤├╣├▓├¿├¼├½
Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250

4 Answers4

3
// Alert: This is Windows-specific, uses undocumented methods, does not
// handle stdout redirection, does not check for errors, etc.
// Use at your own risk.
// Tested with Go 1.0.2-windows-amd64.

package main

import "unicode/utf16"
import "syscall"
import "unsafe"

var modkernel32 = syscall.NewLazyDLL("kernel32.dll")
var procWriteConsoleW = modkernel32.NewProc("WriteConsoleW")

func consolePrintString(strUtf8 string) {
    var strUtf16 []uint16
    var charsWritten *uint32

    strUtf16 = utf16.Encode([]rune(strUtf8))
    if len(strUtf16) < 1 {
        return
    }

    syscall.Syscall6(procWriteConsoleW.Addr(), 5,
        uintptr(syscall.Stdout),
        uintptr(unsafe.Pointer(&strUtf16[0])),
        uintptr(len(strUtf16)),
        uintptr(unsafe.Pointer(charsWritten)),
        uintptr(0),
        0)
}

func main() {
    consolePrintString("Hello ☺\n")
    consolePrintString("éèïöîôùòèìë\n")
}
jason_s
  • 164
  • 2
  • 1
    Interesting (+1), but I might keep my method for now, which seems a bit more robust than using unsafe methods. – VonC Aug 21 '12 at 16:57
2

The online book "Network programming with Go" (CC BY-NC-SA 3.0) has a chapter on Charsets (Managing character sets and encodings), in which Jan Newmarch details the conversion of one charset to another. But it seems cumbersome.

Here is a solution (I might have missed a much simpler one), using the library go-charset (from Roger Peppe).
I translate an utf-8 string to an ibm850 encoded one, allowing me to print in a DOS windows:

éèïöîôùòèìë

The translation function is detailed below:

package main

import (
    "bytes"
    "code.google.com/p/go-charset/charset"
    _ "code.google.com/p/go-charset/data"
    "fmt"
    "io"
    "log"
    "strings"
)

func translate(tr charset.Translator, in string) (string, error) {
    var buf bytes.Buffer
    r := charset.NewTranslatingReader(strings.NewReader(in), tr)
    _, err := io.Copy(&buf, r)
    if err != nil {
        return "", err
    }
    return string(buf.Bytes()), nil
}

func Utf2dos(in string) string {
    dosCharset := "ibm850"
    cs := charset.Info(dosCharset)
    if cs == nil {
        log.Fatal("no info found for %q", dosCharset)
    }
    fromtr, err := charset.TranslatorTo(dosCharset)
    if err != nil {
        log.Fatal("error making translator from %q: %v", dosCharset, err)
    }
    out, err := translate(fromtr, in)
    if err != nil {
        log.Fatal("error translating from %q: %v", dosCharset, err)
    }
    return out
}

func main() {
    test := "éèïöîôùòèìë"
    fmt.Println("utf-8:\n", test)
    fmt.Println("ibm850:\n", Utf2dos(test))
}
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • @ScottStensland And my other answer below? (https://stackoverflow.com/a/44990465/6309) – VonC Oct 23 '18 at 04:30
2

Since 2016, You can now (2017) consider the golang.org/x/text, which comes with a encoding charmap including the ISO-8859 family as well as the Windows 1252 character set.

See "Go Quickly - Converting Character Encodings In Golang"

r := charmap.ISO8859_1.NewDecoder().Reader(f)
io.Copy(out, r)

That is an extract of an example opening a ISO-8859-1 source text (my_isotext.txt), creating a destination file (my_utf.txt), and copying the first to the second.
But to decode from ISO-8859-1 to UTF-8, we wrap the original file reader (f) with a decoder.

I just tested (pseudo-code for illustration):

package main

import (
    "fmt"

    "golang.org/x/text/encoding"
    "golang.org/x/text/encoding/charmap"
)

func main() {
    t := "string composed of character in cp 850"
    d := charmap.CodePage850.NewDecoder()
    st, err := d.String(t)
    if err != nil {
        panic(err)
    }
    fmt.Println(st)
}

The result is a string readable in a Windows CMD.
See more in this Nov. 2018 reddit thread.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
0

It is something that Go still can't do out of the box - see http://code.google.com/p/go/issues/detail?id=3376#c6.

Alex

alex
  • 2,178
  • 16
  • 14
  • I think http://code.google.com/p/go/issues/detail?id=3346 is closer to the topic of my question, isn't it? – VonC Aug 22 '12 at 06:29
  • Sure. But it is marked as a duplicate of 3376. Either way, it is something that still needs to be done. – alex Aug 22 '12 at 11:46
  • I mean, 3346 much more closely illustrates the problem at hand, while its parent (3376) is a much broader one. So, you are saying that I was right using an external library to achieve the right output? – VonC Aug 22 '12 at 11:49
  • 1
    If your console uses ibm850 code page, then your solution is good. But, as far as i know, you can have only so many different characters there. It will not include most of Unicode characters. To be most inclusive, you should use utf16 code page and WriteConsoleW to output your text. – alex Aug 22 '12 at 12:03
  • Good point. In my specific case, the subset present in the conversion library is enough, but I agree with your comment, illustrated by jason_s's code above – VonC Aug 22 '12 at 12:05