Decoding quoted-printable email in Golang

Question

When you type a two spaces in a row in an html email in Gmail it encodes it into the quoted-printable body as "=C2=A0 " if you look at the source of the email.

According to this stackoverflow answer, because of the UTF-8 encoding this should be converted to 00A0 (nbsp) when decoded: https://stackoverflow.com/a/2774507

However, in Golang, this isn't how it works:

s := `Text Text Text.=C2=A0 That's just two spaces`

r := strings.NewReader(s)

qpReader := quotedprintable.NewReader(r)

all, _ := ioutil.ReadAll(qpReader)

str := string(all)

fmt.Println(strings.Index(str, "\xC2\xA0"))

This outputs "15", here's the Playground link: https://play.golang.org/p/8n6L7dlZPt

Instead of it using an NBSP there, it will keep the \xC2 and result in "Text Text TextÂ That's just two spaces".

What's the best way to correctly render this as \x00A0?

All is fine. You should read more about Unicode and it's representation in UTF-8. You want a nonbreakable space U+00A0 and you got one as U+00A0 is _encoded_ _as_ UTF-8 is the byte sequence 0xC2A0 (not "translated") . All is good, quotedprintable works fine and you got your nbsp. Add a `fmt.Println(str)` and inspect the output in the Playground, it will render a HTML entity because you got a nbsp. Read https://blog.golang.org/strings and google for "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets". — Volker, Dec 01 '16 at 08:35
BTW: "What's the best way to correctly render this as \x00A0?" most probably is the _wrong_ _question_. U+00A0 is the unicode code point (In Go-speak a "rune", an abstract character) and you want to encode this rune as 0xC2A0 in any UTF-8 encoded string. Package unicode/utf8 helps converting runes from/to codepoints if you actually would need that (you don't). — Volker, Dec 01 '16 at 08:39
Thanks everyone, very helpful. Turned out lack of UTF support on the service in the next step of my app and this was just a red herring. — Zach Hobbs, Dec 03 '16 at 18:26

score 0 · Accepted Answer · answered Dec 01 '16 at 10:47

As Volker explained in his comment, a Go string is simply a slice of bytes. In your case, it's already encoded as UTF-8 which is Go's default encoding. To access the actual Unicode code points (runes in Go lingo), use something like:

// Prints 15.
fmt.Println(strings.IndexRune(str, '\xA0'))

// Prints A0.
fmt.Printf("%X\n", []rune(str)[15]);

How to correctly render the string depends on where you want to render it. But in most cases, you can pass it as is since it's already in UTF-8.

Decoding quoted-printable email in Golang

1 Answers1