3

I have several files of packed int64s. I need them in memory as int64 slices. The problem is that the files are all together over half the size of the memory of the machine, so space is limited. The standard option in Go would be something like:

a := make([]int64, f.Size()/8)
binary.Read(f, binary.LittleEndian, a)

Unfortunately, the binary package will immediately allocate a []byte with size f.Size()*8, and run out of memory.

It does work if I read each byte one at a time and copy it into the slice, but this is prohibitively slow.

The ideal case would be something like casting the []byte directly to []int64, just telling the compiler "ok, these are ints now`, but obviously that doesn't work. Is there some way to accomplish something similar? Possibly using the unsafe package or dropping into C if absolutely needed?

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
  • It is not possible to cast a `[]byte` directly to a `[]int64`, because the two types have different memory layouts and sizes. However, you can convert a `[]byte` to a `[]int64` by first converting the `[]byte` to a `[]uint64`, and then converting each element of the `[]uint64` to an `int64`. – Vishwa Ratna Jun 20 '23 at 05:19
  • "It does work if I read each byte one at a time and copy it into the slice, but this is prohibitively slow." Why do you say that? Of course you would buffer the actual _reads_ from the file via package bufio but you have to write each byte of your slice and if you do that in-order it won't get any faster. I think you claim "this is prohibitively slow" is simply wrong. – Volker Jun 20 '23 at 06:00
  • Does this answer your question? [Convert between slices of different types](https://stackoverflow.com/questions/11924196/convert-between-slices-of-different-types) – Erwin Bolwidt Jun 20 '23 at 06:49
  • If you control the file format, you might look at something like Apache Arrow, which uses a format that is identical over-the-wire and in memory, ot make such operations very efficient. – Jonathan Hall Jun 20 '23 at 09:00

1 Answers1

2

I have several files of packed int64s. I need them in memory as int64 slices. The problem is that the files are all together over half the size of the memory of the machine, so space is limited.

The standard option in Go would be something like:

a := make([]int64, f.Size()/8)
binary.Read(f, binary.LittleEndian, a)

Unfortunately, the binary package will immediately allocate a []byte with size f.Size()*8, and run out of memory.


All functions use minimal memory.


// Same endian architecture and data
// Most efficient (no data conversion).
func readFileInt64SE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    return i64, nil
}

For example, for amd64 (LittleEndian) architecture and LittleEndian data maximum efficiency (no data conversion necessary), use readFileInt64SE.


The byte order fallacy - rob pike
https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html


// LittleEndian in-place data conversion for any architecture
func readFileInt64LE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    for i, j := i64Size, 0; i <= len(b); i, j = i+i64Size, j+1 {
        i64[j] = int64(binary.LittleEndian.Uint64(b[i-i64Size : i]))
    }

    return i64, nil
}

// BigEndian in-place data conversion for any architecture
func readFileInt64BE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    for i, j := i64Size, 0; i <= len(b); i, j = i+i64Size, j+1 {
        i64[j] = int64(binary.BigEndian.Uint64(b[i-i64Size : i]))
    }

    return i64, nil
}

rocka2q
  • 2,473
  • 4
  • 11
  • This is perfect, but I should mention it only works because I've stored the ints with `binary.LittleEndian`, which is the same format as the architecture I'm reading them back on (Intel x86, AMD, ARM). – James Pettit Jun 20 '23 at 21:19
  • @JamesPettit: For any architecture and any `int64` data, see my revised answer. – rocka2q Jun 21 '23 at 17:32