Converting between rune and byte (slice)

Question

Go allows conversion from rune to byte. But the underlying type for rune is int32 (because Go uses UTF-8) and for byte it is uint8, the conversion therefore results in a loss of information. However it is not possible to convert from a rune to []byte.

var b byte = '©'
bs := []byte(string('©'))
fmt.Println(b)
fmt.Println(bs)

// Output
169
[194 169]

Working example

Why does Go allow conversion from rune to byte instead of rune to []byte?

The reason one conversion works and the other not is because those are the rules of the language spec. For more details on conversion you can read this: https://golang.org/ref/spec#Conversions. If you're asking *why* the authors chose those specific rules then you'll have to ask the authors themselves, and you can try the go-nuts mailing list to do that, the authors do interact with the users there. — mkopriva, Aug 01 '20 at 17:37
+1 for an interesting question. Does this partly answer it? See https://stackoverflow.com/a/62739051/12817546 — , Aug 02 '20 at 03:55

score 7 · Accepted Answer · 2020-08-01T20:23:55.333

Go supports conversion from rune to byte as it does for all pairs of numeric types. It would be a surprising special case if int32 to byte conversion was not allowed.

But the underlying type for rune is int32 (because Go uses UTF-8)

This misses an important detail: rune is an alias for int32. They are the same type.

It's true that the underlying type is rune is int32, but that's because rune and int32 are the same type and the underlying type of a builtin type is the type itself.

The representation of Unicode code points as int32 values is unrelated to UTF-8 encoding.

the conversion therefore results in a loss of information

Yes, conversions between numeric types can result in loss of information. This is one reason why conversions in Go must be explicit.

Note that the statement var b byte = '©' does not do any conversions. The expression '@' is an untyped constant.

The compiler reports an error if the assignment of an untyped constant results in a loss of information. For example, the statement var b byte = '世' causes a compilation error.

All UTF-8 encoding functionality in the language is related to the string type. The UTF-8 aware conversions are all to or from the string type. The []byte(numericType) conversion could be supported, but that would bring UTF-8 encoding outside of the string type.

The Go authors regret including the string(numericType) conversion because it's not very useful in practice and the conversion is not what some people expect. A library function is a better place for the functionality.

Use `b2 := '©'` to get "int32 169" and `var b byte = '©'` to get “uint8 169". Likewise use r := '世' to get “int32 19990” and `byte(r)` to get “uint8 22”. See https://play.golang.org/p/IcufRCuzFk. — , Aug 02 '20 at 04:17

score 0 · Answer 2 · 2020-08-02T06:28:07.483

Yes it is possible to convert from a rune to []byte (for example via a byte) and back again.

package main

import "fmt"

func main() {
    var b byte = '©'
    bs := []byte{b}
    fmt.Printf("%T %v\n", b, b)   // uint8 169
    fmt.Printf("%T %v\n", bs, bs) // []uint8 [169]

    s := string(bs[0]) // s := string(b) works too.
    r2 := rune(s[0]) // r2 := rune(b) works too.
    fmt.Printf("%T %v\n", s, s) // string ©
    fmt.Printf("%T %v\n", r2, r2) // int32 169
}

score -1 · Answer 3 · answered Aug 01 '20 at 18:54

The reason for this behaviour is the same reason why it's legal to do

var b int32
b = 1000000
fmt.Printf("%b\n", b)
fmt.Printf("%b", uint8(b))

// Output:
// 11110100001001000000
// 1000000

You should expect the conversion to loose data when you put data of a type with larger memory footprint into one with a smaller memory footprint.

Also, for encoding a rune you can use EncodeRune which indeed uses a []byte.

Converting between rune and byte (slice)

3 Answers3