1

I am looking for a way to save a vector of a struct to take up the least amount of space on file. I've read that can be accomplished with #pragma pack (from here), and that writing can then be accomplished as follows:

#pragma pack(push,1)
struct GroupedData {
    double m_average;
    int m_count;
    char m_identifier;
};
#pragma pack(pop)

vector<GroupedData> alldata;
//here alldata is filled with stuff
FILE *file = fopen("datastorage.bin", "wb");
fwrite(&alldata[0],sizeof(GroupedData),alldata.size(),file);
fclose(file);

However in one of the answers of that question it was said that because of memory alignment, memory access to the data would be much slower. To maintain memory efficiency and the lowest file size, I expect the following function to be able to achieve this.

struct GroupedData {
    double m_average;
    int m_count;
    char m_identifier;
    void WriteStruct(FILE* file) {
        fwrite(&m_average,sizeof(double),1,file);
        fwrite(&m_count,sizeof(int),1,file);
        fwrite(&m_identifier,sizeof(char),1,file);
    }
};

vector<GroupedData> alldata;
//here alldata is filled with stuff
FILE *file = fopen("datastorage.bin", "wb");
for (size_t i=0; i<alldata.size(); ++i)
    alldata[i].WriteStruct(file);
fclose(file);

However wouldn't this write function take much longer to execute because each variable is written independently? So, how can I 'balance' fast memory access with the lowest file storage and file writing speed?

Community
  • 1
  • 1
DoubleYou
  • 1,057
  • 11
  • 25
  • It may or may not be slower depending on various factors, but make sure it really isn't fast enough before worrying too much about it. – Vaughn Cato May 19 '14 at 01:26
  • I would try both approaches with a million or so objects and make a decision based on the outcome of that experiment. – R Sahu May 19 '14 at 01:26
  • If storage space actually matters, you'll want to compress the data anyways using something like zlib, which should be able to get rid of the padding just fine without you needing to mangle your data structures. – user57368 May 19 '14 at 01:40
  • @RSahu: Just did the tests (10 million structs, time averaged over 10 tests). Memory access: pragma=0.1442s, individual=0.1361s. Write to file: pragma=1.5799s, individual=4.2381s. Memory access is done by a simple `for` loop storing all struct variables in a temporary variable. So there's hardly any pragma memory delay... I'm sticking with pragma. [This answer](http://stackoverflow.com/questions/232785/use-of-pragma-in-c/235922#235922) discourages `#pragma` though - then how would you obtain fast write access? – DoubleYou May 19 '14 at 02:06
  • @DoubleYou When it comes to performance issues, you have to sacrifice portability a little bit and write different code for different platforms. – R Sahu May 19 '14 at 02:17

1 Answers1

1

You can optimize all of them (memory space, memory access and write speed) by changing the layout in memory. That is, you should introduce another layer to handle the data in memory and on disk, a block of 8 items for example:

struct GroupedDataBlock {
    double m_average[8];
    int m_count[8];
    char m_identifier[8];
};

In this way all the data will be naturally aligned. You will have a slightly complex logic to handle the vector of elements. Following the idea I'm suggesting, I would define a class to handle the single GroupedData elements, hiding this representation:

class GroupedData {
    GroupedDataBlock *groupedDataBlock;
    int inBlockIndex;
public:
    GroupedData(GroupedDataBlock *gdb, int index) : groupedDataBlock(gdb), inBlockIndex(index) {}
    double &m_average()    {return groupedDataBlock->m_average[inBlockIndex]; }
    int    &m_count()      {return groupedDataBlock->m_count[inBlockIndex]; }
    char   &m_identifier() {return groupedDataBlock->m_identifier[inBlockIndex]; }
};

And then the vector of elements need some customizations as well. Here only the indexing is reported, you need to add logics to handle the needed operations (add of elements).

class GroupedDataVector {
    vector<GroupedDataBlock> alldata;
    size_t actual_size;
public:
    GroupedData operator[] (int i) {return GroupedData(&alldata[i/8], i%8); }
};

To write your file you will just need to write alldata in one shot. The only overhead will be the eventually not totally filled last block.

Sigi
  • 4,826
  • 1
  • 19
  • 23
  • While reasonable, it may be a **lot** simpler to use this structure **just** while doing file I/O. I.e. keep your vector as-is, convert 8 elements at a time to a single `GroupedDataBlock`, and write that. Reading is just the reverse: read one `GroupedDataBlock` and unpack. – MSalters May 19 '14 at 08:47
  • simpler yes, but I think that you will have some overhead in the conversion, especially because you cannot do anymore the bulk-write operation the OP is wondering to do. DoubleYou, will you try both the alternatives and come back with the verdict? :) – Sigi May 19 '14 at 08:52
  • Caching (in the runtime libs, OS and/or the harddisk itself) will turn the individual writes into a few bigger writes. – MSalters May 19 '14 at 09:09
  • @Sigismondo: After doing the speed tests as R Sahu suggested in the comments on my question, I decided for the `#pragma` solution. I understand that the `GroupedDataBlock` will help in memory and for file writing, but I wonder if the additional functions required for managing the vector will maybe eventually be slower. As you can see from my speed test, there is very little difference between `#pragma` and aligned access. I know I did not mention memory usage as a requirement - but the GroupedDataBlock adds an integer for each block as well. – DoubleYou May 22 '14 at 00:48
  • `GroupedData` is meant here to be used only to work on data consistently with your example and to show how to use it: it will be easily inlined by the compiler and you can also omit it at all where speed is important. However it's not what you store - the memory consumption of `GroupedDataBlock`'s, that's what you store, has not overhead. This is just to point it out - your the problem, your the chosen solution! Good luck. – Sigi May 22 '14 at 02:49