1

I have 3 vector, each with exactly 256^3 ~ 16 million elements that i want to store in a file and read as fast as possible. I only care about reading performance, and the representation of the data in memory can be any.

I have taken a look at some serialization techniques as well as writing/ reading plain numbers to/ from a file with ofstream, however i wonder if there is a more direct and faster approach.

(i am pretty new to c++ and its concepts)

  • 8
    Do you care about compatibility with other OSes or machines? If not, you could dump the binary representation of your vectors' contents straight to a binary file. – Quentin Jan 17 '19 at 15:20
  • Depend on the filesystem, if you are multiprocessing... – Matthieu Brucher Jan 17 '19 at 15:22
  • It needs to run on windows as well as android, potientially microsoft mixed reality platform – Commodore Yournero Jan 17 '19 at 15:22
  • @CommodoreYournero but do you need to transfer such a file from one to the other? – Quentin Jan 17 '19 at 15:23
  • 1
    I need to create the file using windows and still load it using android – Commodore Yournero Jan 17 '19 at 15:24
  • 1
    What do these vectors represent? –  Jan 17 '19 at 15:27
  • I suspect that doubles have the same binary format on all these systems. Still won't be portable to some obscure system out there. It is simple to verify it. – Michael Veksler Jan 17 '19 at 15:27
  • 1
    @MichaelVeksler I doubt all those would be guaranteed to have the same double format: https://developer.android.com/ndk/guides/abis – Jeffrey Jan 17 '19 at 15:28
  • What range of doubles do you use ? If it is for a 3D mesh between 0.0001 and 1000, the solution will be different than if it is for scientifc data that may use larger exponents – Jeffrey Jan 17 '19 at 15:29
  • @Jeffrey usually double is in IEEE 754 format, which is well defined with the same binary on all supporting systems. Some systems have other formats, so it's hard to say without some digging – Michael Veksler Jan 17 '19 at 15:31
  • Range is between 0 and approx 100. The vectors represent a color conversion hashmap from rgb to lab color space. – Commodore Yournero Jan 17 '19 at 16:37

1 Answers1

1

Assuming both systems, windows and android, are little endian, which is common in ARM and x86/x64 CPUs, you can do the following.

First: Determine the type with a sepcific size, so either double, with 64-bit, float with 32-bit, or uint64/32/16 or int64/32/16. Do NOT use stuff like int or long to determine your data type.

Second: Use the following method to write binary data:

std::vector<uint64_t> myVec;
std::ofstream f("outputFile.bin", std::ios::binary);
f.write(reinterpret_cast<char*>(myVec.data()), myVec.size()*sizeof(uint64_t));
f.close();

In this, you're take the raw data and writing its binary format to a file.

Now on other machine, make sure the data type you use has the same datatype size and same endianness. If both are the same, you can do this:

std::vector<uint64_t> myVec(sizeOfTheData);
std::ifstream f("outputFile.bin", std::ios::binary);
f.read(reinterpret_cast<char*>(&myVec.front()), myVec.size()*sizeof(uint64_t));
f.close();

Notice that you have to know the size of the data before reading it.

Note: This code is off my head. I haven't tested it, but it should work.

Now if the target system doesn't have the same endianness, you have to read the data in batches, flip the endianness, then put it in your vector. How to flip endianness was extensively discussed here.

To determine the endianness of your system, this was discussed here.

The penalty on performance will be proportional to how different these systems are. If they're both the same endianness and you choose the same data type and size, you're good and you have optimum performance. Otherwise, you'll have some penalty depending on how many conversion you have to do. This is the fastest that you can ever get.

Note from comments: If you're transferring doubles or floats, make sure both systems use IEEE 754 standard. It's very common to use these, way more than endianness, but just to be sure.

Now if these solutions don't fit you, then you have to use a proper serialization library to standardize the format for you. There are libraries that can do that, such as protobuf.

The Quantum Physicist
  • 24,987
  • 19
  • 103
  • 189
  • Definitely not what the OP wants. Android doesn't guarantee this. – Jeffrey Jan 17 '19 at 15:32
  • @Jeffrey Doesn't guarantee what exactly? – The Quantum Physicist Jan 17 '19 at 15:32
  • Endianness and datatype size. Also, can you static_cast<> double* to char* ? My compiler gives me "static_cast from 'double *' to 'char *' is not allowed" – Jeffrey Jan 17 '19 at 15:35
  • @Jeffrey Endianness can be flipped and can be detected at compile time. If `static_cast` doesn't work use `reinterpret_cast` – The Quantum Physicist Jan 17 '19 at 15:36
  • For floating point yet important: both need to use IEEE 754 for representation - but that's just as common as LE... – Aconcagua Jan 17 '19 at 15:41
  • @Aconcagua Absolutely. I can't imagine android would use some other FP type. I'll add that note. – The Quantum Physicist Jan 17 '19 at 15:41
  • I don't need to support every potential Android System, so assuming/ checking little endian as well as IEEE 754 would be fine. I'll test this, looks like a good solution – Commodore Yournero Jan 17 '19 at 16:44
  • @TheQuantumPhysicist this works except for the last 512 double values which are all 0. Tested on the same system. I can't explain why – Commodore Yournero Jan 17 '19 at 17:17
  • @CommodoreYournero Test with small data, then expand. Make sure you're resizing the vector before you read to it. – The Quantum Physicist Jan 17 '19 at 17:36
  • @TheQuantumPhysicist Tested with only 256^2 doubles, still the last 512 entries of the loaded vector are 0. With 256 entries, all loaded doubles equal 0. Seems like a fixed offset somewhere because the last 512 values are not assigned when loading the vector. The vectors are of exact same size. -i don't want to add code in the comments and its really just what you posted but i can put the code in the question if you want – Commodore Yournero Jan 17 '19 at 17:56
  • @CommodoreYournero With all due respect, don't become a [help vampire](https://meta.stackoverflow.com/questions/258206/what-is-a-help-vampire). Stack overflow is not a coding service. I gave you the recipe. You're not supposed to copy/paste my code and then complain it's not working. Learn how to encapsulate your designs and test them independently. Spend days trying to understand what I wrote with documentation and then try to figure out why it doesn't work. I can only guess why you have that problem, but it's only you who can learn how to debug this and become a better developer. – The Quantum Physicist Jan 17 '19 at 18:32
  • @TheQuantumPhysicist i can't be sure your recipe works, maybe there is more to it. I understand your suggestion and don't see the mistake so i figured i'd ask, because after hours of clueless debugging i will just go for another way to do it – Commodore Yournero Jan 18 '19 at 08:52
  • @CommodoreYournero That's fine. Just know that I used this solution during my academia years for years. Again, you should learn how to consult documentation and debug to understand what's wrong with the code. If it *really* doesn't work you should be able to exactly and very accurately pinpoint why it doesn't. Anyway, good luck with the other way. – The Quantum Physicist Jan 18 '19 at 09:04
  • @TheQuantumPhysicist so apparently not closing the ofstream after writing can cause some very wild behaviour. I'll mark as accepted if you can add that to the answer. Thanks for your effort – Commodore Yournero Jan 18 '19 at 11:41
  • @CommodoreYournero It depends on where you put it. If you're transferring between machines, this won't matter because `fstream` will close in its destructor when it gets out of scope. Your tests are a whole different story. I think I understand the problem you were facing. Because you were not closing, `fstream` didn't flush its contents to the file. You could get the same effect as closing if you `f.flush()`. Kudos for having solved this yourself! – The Quantum Physicist Jan 18 '19 at 14:29