I have large data which may be 30 GB. It seems that I need to partition them into many smaller pieces so that I can store them using flatbuffer respectively.
I have already read this post. FlatBuffers: How to write giant files
However, I'm still not sure how to do it. I have two questions below.
I have a schema like this.
table A {
number: int;
}
table B {
a: [A];
}
root_type B
If I have some objects a0, a1, a2, and a3, I partition them into two FlatBuffers and store them in disk. The first FlatBuffer contains a0 and a1. The second contains a2 and a3. If I need a2 data, how do I know which FlatBuffer contains a2? Does FlatBuffers API support this?
I create a0, a1, a2, a3,... sequentially, and I want to partition them once the FlatBuffer size is larger than 10 MB. I know I can get the size of the flatbuffer via int size = builder.GetSize()
. However, since I create these objects sequentially, how do I know the size of FlatBuffer without calling builder.Finish(orc)
?
Thanks for your help.
Updated: I wrote some codes like this:
flatbuffers::FlatBufferBuilder builder;
int num0 = 3;
int num1 = 1;
int num2 = 5;
int num3 = 7;
auto a0 = CreateA(builder, num0);
cout << "size of a0 = " << builder.GetSize() << endl;
auto a1 = CreateA(builder, num1);
cout << "size of a0 and a1 = " << builder.GetSize() << endl;
auto a2 = CreateA(builder, num2);
cout << "size a0, a1, and a2 = " << builder.GetSize() << endl;
auto a3 = CreateA(builder, num3);
cout << "size a0, a1, a2, and a3 = " << builder.GetSize() << endl;
std::vector<flatbuffers::Offset<A>> A_vector;
A_vector.push_back(a0);
A_vector.push_back(a1);
A_vector.push_back(a2);
A_vector.push_back(a3);
auto B = builder.CreateVector(A_vector);
auto orc = CreateB(builder, B);
builder.Finish(orc);
cout << "size all = " << builder.GetSize() << endl;
// size a0 = 14
// size of a0 and a1 = 30
// size a0, a1, and a2 = 40
// size a0, a1, a2, and a3 = 48
// size all = 80
Could you kindly explain how these size be calculated? Why does the size of a0 and a1 not twice of a0? That is, 14*2 = 28 instead of 30. Same problem in a2 and a3. Finally, why does the size all equal to 80?
Thanks again.