0

I am not an Go unsafe package expert - Neither am I a seasoned C programmer. I am trying to read a huge file > 1G using mmap syscall in go. There are a number of reasons I do mmap and munmap as opposed to read, write I/O. That is beside the point - I can write to the file in a test, when I read from the file, I can ascertain that bytes length matches up, but I cannot read contents of this string file :( Can someone suggest some reading? I need to do to go a little further, here's some code I cooked up for sample test:

filename := "/tmp/dd_file.db"
f, err := os.OpenFile(filename, os.O_RDWR, 0666)
defer f.Close()
if err != nil {
    fmt.Printf("error opening file: %v", err)
}
stat, _ := f.Stat()
size := stat.Size()
fmt.Printf("[READ-ONLY] : size was : %+v\n", size)
got := make([]byte, size)
if _, err := f.ReadAt(got, 0); err != nil && err != io.EOF {
    panic(err)
}
want, err := ioutil.ReadFile(filename)
if err != nil {
    fmt.Printf("[READ-ONLY] : ioutil.ReadFile: %v", err)
}
// going to change the file size now, punch in a few things
t := unsafe.Sizeof("")
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}
_, err = f.Seek(int64(t-1), 0)
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}
_, err = f.Write([]byte(" "))
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}
mmap, err := syscall.Mmap(int(f.Fd()), 0, int(t), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}
 // not too sure on reading data on string - doesnt work as expected.
map_array := (*[10000]string)(unsafe.Pointer(&mmap[0]))
map_array[0] = "yellow!"
err = syscall.Munmap(mmap)
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}
newStat, _ := f.Stat()
newSize := newStat.Size()
fmt.Printf("[mmap( ) RW] : size was : %+v\n", newSize)
got = make([]byte, newSize)
if _, err := f.ReadAt(got, 0); err != nil && err != io.EOF {
    panic(err)
}
if len(got) == len(want) {
    fmt.Println("well the lengths are equal atleast??!")
}
if !bytes.Equal(got, want) {
    fmt.Printf("\n [mmap( ) RW] : works! got  %d \n want %d", len(got), len(want))
}

This obviously works as expected - but what if I wanted to read via mmap( ) on an mmapped file, how do I read string out of these bytes (I have a sense there is an encoding package somewhere that I might have to put to use perhaps but then StringHeader on unsafe documentation confused me).

Suggestions.

anirudh.vyas
  • 552
  • 5
  • 11
  • Probably I don't get what you mean. If you're using `mmap`, isn't the content of the file will be available as `[]byte`?. To get the string, you can do `string(mmap[:100])` <-- convert first 100 bytes to a string. Or you can use [`bytes.Buffer`](https://golang.org/pkg/bytes/#Buffer) (but probably not what you want since you're avoiding io.reader/io.writer pattern) – putu Jun 13 '17 at 08:05
  • I wanna avoid reader - but my problem is I suspect while writing string to bytes via mmap it's not flushing legit chars just some control chars - I suspect my encoding might be screwed up - any suggestions? – anirudh.vyas Jun 14 '17 at 06:02
  • Take a look at [`https://github.com/riobard/go-mmap`](https://github.com/riobard/go-mmap). That package can be used as a reference on `mmap` usage. For writing string to `mmap`, you can do `copy(mmap, []byte("Your string"))`. – putu Jun 14 '17 at 07:26
  • 1
    *There are a number of reasons I do mmap and munmap as opposed to read, write I/O.* And if those reasons are performance related, did you bother to actually benchmark `mmap()` vs. `read()`/`write()`? Because if you're using `mmap()` like this for performance reasons, you're almost certainly wrong. [Read this answer](https://stackoverflow.com/a/9818473/4756299), and especially the two links to one Linus Torvalds explaining how `mmap()` can be ***SLOW***. And note that as good as that answer is, the author blows it at the end by comparing `mmap()` to `fread()` - buffered, `stdio`-based and slow. – Andrew Henle Jun 17 '17 at 13:52

1 Answers1

0

As @putu noted in a comment, a byte slice can be converted to a string by a simple type conversion:

asStr = string(byteSlice) // entire slice as a string
partStr = string(byteSlice[:100]) // first 100 bytes as a string
Adrian
  • 42,911
  • 6
  • 107
  • 99
  • That doesn't work - I am doing an unsafe pointer conversion to byte array gotten from mmap to get a string array pointer - but when I read I see some control chars maybe I'm not encoding string to bytes right way when I'm flushing them to file via mmap? – anirudh.vyas Jun 14 '17 at 06:01
  • Possible the control chars are in the data, possible the encoding coming out doesn't match the encoding going in, and possible you're doing a string conversion on a byte slice that starts in the middle of a multi-byte UTF-8 code point. – Adrian Jun 14 '17 at 13:21