Why range and subscription of a string produce different types?

Question

import (
    "fmt"
    "reflect"
)
func main() {
    s := "hello" // Same results with s := "世界"
    for _, x := range s {
            kx := reflect.ValueOf(x).Kind()
            fmt.Printf("Type of x is %v\n", kx)
            break
    }
    y := s[0]
    ky := reflect.ValueOf(y).Kind()
    fmt.Printf("Type of y is %v\n", ky)
}
// Type of x is int32
// Type of y is uint8

I was surprised to learn that I would get a different type if I use string subscription versus getting it via range.

Edit: I just realized that even s is a Unicode string, the type of y is always byte. This also means indexing into a string is unsafe unless it's an ASCII string.

`range` produces `Rune`'s, indexing into the string just gives you the raw ascii value (which fits into 1 byte) — Mathias R. Jessen, Feb 07 '21 at 15:09
See here for something close to an answer (not a duplicate though I think): https://stackoverflow.com/questions/18130859/how-can-i-iterate-over-a-string-by-runes-in-go — BartoszKP, Feb 07 '21 at 15:09
See also https://stackoverflow.com/questions/49062100/is-there-any-difference-between-range-str-and-range-runestr-in-golang, https://stackoverflow.com/questions/41779147/what-determines-the-position-of-a-character-when-looping-through-utf-8-strings, and https://stackoverflow.com/questions/49062100/is-there-any-difference-between-range-str-and-range-runestr-in-golang — JimB, Feb 07 '21 at 15:21

score 0 · Accepted Answer · answered Feb 07 '21 at 15:49

For statements with range clause: (Link)

For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.

Now let's look at the types: (Link)

// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8

// rune is an alias for int32 and is equivalent to int32 in all ways. It is
// used, by convention, to distinguish character values from integer values.
type rune = int32

So this explains why int32 is for a rune, and uint8 is for a byte.

Here's some code to make the point clear. I've added some code and changed the string to make it better. I hope the comments are self-explanatory. Also, I'd recommend reading: https://blog.golang.org/strings as well.

package main

import (
    "fmt"
    "reflect"
)

func main() {
    // Changed the string for better understanding
    // Each character is not of single byte
    s := "日本語"

    // Range over the string, where x is a rune
    for _, x := range s {
        kx := reflect.ValueOf(x).Kind()
        fmt.Printf(
            "Type of x is %v (%c)\n",
            kx,
            x, // Expected (rune)
        )
        break
    }

    // Indexing (First byte of the string)
    y := s[0]
    ky := reflect.ValueOf(y).Kind()
    fmt.Printf(
        "Type of y is %v (%c)\n",
        ky,
        y,
        /*
            Uh-oh, not expected. We are getting just the first byte
            of a string and not the full multi-byte character.
            But we need '日' (3 byte character).
        */

    )

    // Indexing (First rune of the string)
    z := []rune(s)[0]
    kz := reflect.ValueOf(z).Kind()
    fmt.Printf(
        "Type of z is %v (%c)\n",
        kz,
        z, // Expected (rune)
    )
}

Sample output:

Type of x is int32 (日)
Type of y is uint8 (æ)
Type of z is int32 (日)

Note: In case your terminal is not showing the same output; there might be some issue with character encoding settings. So, changing that might help.

Why range and subscription of a string produce different types?

1 Answers1

Related