8

I'm experimenting with FlatBuffers to store OpenStreetMap data. So, this would be 50GB.

Since usually everything is kept in memory, how is it possible (if at all) to sequentially write data into a file?

I have a feeling this is not quite where FlatBuffers is good for.

benjist
  • 2,740
  • 3
  • 31
  • 58
  • 1
    Have a look at the memory mapped file API's for your platform. There are ways to map regions into memory so you don't have to have the whole 50GB in memory, but sections of it. – hookenz Feb 10 '16 at 19:45

2 Answers2

5

There currently is no way to create a single FlatBuffer without having it all in memory at once. The only way to do it is to instead write out a chain of (length-prefixed) smaller independent FlatBuffers.

Aardappel
  • 5,559
  • 1
  • 19
  • 22
  • Is there a common way to do this? – benjist Feb 10 '16 at 19:37
  • 2
    No, there isn't, but your question makes me think that we should add one to the FlatBuffers library. – Aardappel Feb 10 '16 at 19:41
  • It is not hard though, what I'd suggest is a file that contains: 32bit length of FlatBuffer 1 - buffer contents of FlatBuffer 1 - 32bit length of FlatBuffer 2 - buffer contents of FlatBuffer 2 - etc. – Aardappel Feb 10 '16 at 19:43
  • It would be nice if there was a convenience API for this. I'm researching to to suggest an alternative to the official OSM protobuf format. One thing that makes using PBF, in case of OSM data, awkward is the need for various client-side implementation details. So it will be great if this was possible almost language and implementation independent. – benjist Feb 10 '16 at 20:41
  • One more thing: Is it theoretically possible that FlatBuffers indirects pointers into fixed size (say 2MB) segments of compressed data, that is to be uncompressed when accessed? Compression makes a tremendous difference for the kind of data I have, since delta coding can't be applied. – benjist Feb 10 '16 at 21:16
  • Agreed, it be good to solve this use case generally. The way that FlatBuffers accessing memory directly would make it hard to transparently do compression or any kind of indirection, since that would entail a constant overhead that currently isn't there. – Aardappel Feb 11 '16 at 00:29
  • 2
    btw, added some functionality to make size pre-fixing part of FlatBuffers: https://github.com/google/flatbuffers/commit/486c048a0d5963f83f3a0d6957e4dde41602e2e7 – Aardappel Oct 12 '16 at 21:48
  • Has any progress been made since this post with regards to supporting 64 bit offsets? Just checking because the multi-flatbuffer workaround isn't quite workable for my use-case. – alteredinstance Feb 14 '20 at 22:39
  • 1
    @alteredinstance no, there is no 64-bit version yet. Notes on 64-bit: https://github.com/google/flatbuffers/projects/10#card-14545298 Contributions on github welcome. – Aardappel Feb 15 '20 at 03:20
-3

I think you could use file mapped memory. So the SO will manage the reads and writes (probably you will have to implement some allocators)