I have this function to convert string to slice of bytes without copying
func StringToByteUnsafe(s string) []byte {
strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
var sh reflect.SliceHeader
sh.Data = strh.Data
sh.Len = strh.Len
sh.Cap = strh.Len
return *(*[]byte)(unsafe.Pointer(&sh))
}
That works fine, but with very specific setup gives very strange behavior:
The setup is here: https://github.com/leviska/go-unsafe-gc/blob/main/pkg/pkg_test.go
What happens:
- Create a byte slice
- Convert it into temporary (rvalue) string and with unsafe convert it into byte slice again
- Then, copy this slice (by reference)
- Then, do something with the second slice inside goroutine
- Print the pointers before and after
And I have this output on my linux mint laptop with go 1.16:
go test ./pkg -v -count=1
=== RUN TestSomething
0xc000046720 123 0xc000046720 123
0xc000076f20 123 0xc000046721 z
--- PASS: TestSomething (0.84s)
PASS
ok github.com/leviska/go-unsafe-gc/pkg 0.847s
So, the first slice magically changes its address, while the second isn't
If we remove the goroutine with runtime.GC()
(and may be play with the code a little bit), we can get the both pointers to change the value (to the same one).
If we change the unsafe cast to just []byte()
everything works without changing the addresses. Also, if we change it to the unsafe cast from here https://stackoverflow.com/a/66218124/5516391 everything works the same.
func StringToByteUnsafe(str string) []byte { // this works fine
var buf = *(*[]byte)(unsafe.Pointer(&str))
(*reflect.SliceHeader)(unsafe.Pointer(&buf)).Cap = len(str)
return buf
}
I run it with GOGC=off
and got the same result. I run it with -race
and got no errors.
If you run this as main package with main function, it seems to work correctly. Also if you remove the Convert
function. My guess is that compiler optimizes stuff in this cases.
So, I have several questions about this:
- What the hell is happening? Looks like a weird UB
- Why and how go runtime magically changes the address of the variable?
- Why in concurentless case it can change both addresses, while in concurrent can't?
- What's the difference between this unsafe cast and the cast from stackoverflow answer? Why it does work?
Or is this just a compiler bug?
A copy of the full code from github, you need to put it in some package and run as test:
import (
"fmt"
"reflect"
"sync"
"testing"
"unsafe"
)
func StringToByteUnsafe(s string) []byte {
strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
var sh reflect.SliceHeader
sh.Data = strh.Data
sh.Len = strh.Len
sh.Cap = strh.Len
return *(*[]byte)(unsafe.Pointer(&sh))
}
func Convert(s []byte) []byte {
return StringToByteUnsafe(string(s))
}
type T struct {
S []byte
}
func Copy(s []byte) T {
return T{S: s}
}
func Mid(a []byte, b []byte) []byte {
fmt.Printf("%p %s %p %s\n", a, a, b, b)
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
b = b[1:2]
wg.Done()
}()
wg.Wait()
fmt.Printf("%p %s %p %s\n", a, a, b, b)
return b
}
func TestSomething(t *testing.T) {
str := "123"
a := Convert([]byte(str))
b := Copy(a)
Mid(a, b.S)
}