How capacity of []rune is determined when converting from a string

Question

Can someone explain why I got different capacity when converting the same string in []rune?

Take a look at this code

package main

import (
    "fmt"
)

func main() {
    input := "你好"
            
    runes := []rune(input)

    fmt.Printf("len %d\n", len(input))
    fmt.Printf("len %d\n", len(runes))
    fmt.Printf("cap %d\n", cap(runes))

    fmt.Println(runes[:3])

}

Which return

len 6
len 2
cap 2
panic: runtime error: slice bounds out of range [:3] with capacity 2

But when commenting the fmt.Println(runes[:3]) it return :

len 6
len 2
cap 32

See how the []rune capacity has changed in the main from 2 to 32. How ? Why ?

If you want to test => Go playground

The value of cap is not specified by the Spec, any value big enough is okay. — Volker, Jul 08 '20 at 16:42

icza · Answer 1 · 2020-07-08T18:04:35.357

The capacity may change to whatever as long as the result slice of the conversion contains the runes of the input string. This is the only thing the spec requires and guarantees. The compiler may make decisions to use lower capacity if you pass it to fmt.Println() as this signals that the slice may escape. Again, the decision made by the compiler is out of your hands.

Escape means the value may escape from the function, and as such, it must be allocated on the heap (and not on the stack), because the stack may get destroyed / overwritten once the function returns, and if the value "escapes" from the function, its memory area must be retained as long as there is a reference to the value. The Go compiler performs escape analysis, and if it can't prove a value does not escape the function it's declared in, the value will be allocated on the heap.

See related question: Calculating sha256 gives different results after appending slices depending on if I print out the slice before or not

Mmm I have not though about escaping. It may a bit more sens that to have different allocation for the heap or the stack. Thanks — GeorgesTimoun, Jul 10 '20 at 07:02

score -1 · Answer 2 · answered Jul 08 '20 at 13:28

The reason the string and []rune return different results from len is that it's counting different things; len(string) returns the length in bytes (which may be more than the number of characters, for multi-byte characters), while len([]rune) returns the length of the rune slice, which in turn is the number of UTF-8 runes (generally the number of characters).

This blog post goes into detail how exactly Go treats text in various forms: https://blog.golang.org/strings

I know that. My question was about capacity. – GeorgesTimoun Jul 10 '20 at 06:57 — GeorgesTimoun, Jul 10 '20 at 06:57

How capacity of []rune is determined when converting from a string

2 Answers2