-1

Let's say I have a vector with 9 integers.

in total, I should have 36 bytes.

some of these integers fit in the size of a short, so I wanna store the ones who fit as short in 2 bytes and the ones who don't, in 4.

I noticed that a file with 120 98 99 99 98 257 259 98 0 was 28 bytes and I wonder what I did wrong.

ofstream out(file, ios::binary);
int len = idx.size();                    //idx is the vector<int>
string end = " 0", space = " ";          //end is just to finish the saving.
for(int i = 0; i < len; i++) {
    if(idx[i] <= SHRT_MAX){
        short half = idx[i];
        out<<half;
    }
    else out<<idx[i];
    if(i == len-1) out<<end; else out<<space;
}
Daniel
  • 7,357
  • 7
  • 32
  • 84
  • `strlen("120 98 99 99 98 257 259 98 0")` is 28. – melpomene Dec 11 '16 at 16:39
  • 1
    Somewhat related: `int` is **not** guaranteed to be 4 bytes and `short` is **not** guaranteed to be 2 bytes. Most compilers use those sizes, but the standard does not enforce it. – UnholySheep Dec 11 '16 at 16:42
  • 1
    how are you going to read that file later on? I mean, how do you decide what to read - int or short? – Ap31 Dec 11 '16 at 16:42
  • 1
    Opening a file as `binary` doesn't mean output will be binary. It just means line endings won't be touched. This must be a duplicate. – Martin Bonner supports Monica Dec 11 '16 at 16:42
  • I intended to read as int, but if I'm able to store it as short I could change the reading logics – Daniel Dec 11 '16 at 16:48
  • 1
    @Daniel change logics to what? what i'm hinting at is data stored this way is impossible to restore – Ap31 Dec 11 '16 at 16:49
  • in fact I'm writing a lzw-compression but the int file is getting bigger than the original string file, so I'm trying to store it as short – Daniel Dec 11 '16 at 16:50

1 Answers1

2

First piece of advice, use the header cstdint if you want to work with types of a guaranteed size. Types such as uint16_t are standard and are there for a reason.

Next, this idea of sometimes writing two bytes and sometimes writing four. Keep in mind that when you write data to a file like this, it's just going to look like a big chunk of data. There will not be any way to magically know when to read two bytes and when to read four. You can store metadata about the file, but that would probably be more inefficient than simply just consistently using the same size. Write everything as two bytes or four bytes. That's up to you, but whatever it is you should probably stick with it.

Now, moving on to why you have 28 bytes of data written.

You're writing the ASCII representations of your numbers. This ends up being "120 98 99 99 98 257 259 98 9" which has a size of 28 bytes.

When writing your data, you probably want to do something like

out.write( (char*)&my_data, sizeof(my_data));

Keep in mind though this isn't really a safe way to write binary data. I think you already understand the necessity to make sure you write the size you intend. Sadly the complications with creating portable files doesn't end there. You also need to worry about the endianess of the machine your program is running on. This is an article that I think you might enjoy reading to learn more about the subject.

Disch's Tutorial To Good Binary Files

  • how does this `out.write( (char*)&my_data, sizeof(my_data));` apply to vector ? I mean, I didn't understand this cast to char* – Daniel Dec 11 '16 at 17:06
  • That works on writing individual integral types. You're already writing each element in your vector one at a time anyways. – Austin Jenkins Dec 11 '16 at 17:08
  • The reason you cast to char is because 'write' doesn't understand other types. It just cares about getting a what looks like an array of bytes. – Austin Jenkins Dec 11 '16 at 17:09
  • so you're suggesting instead of do it with the vector, once I add an element x to the vector, I write `out.write( (char*) &x, sizeof(x))` ? – Daniel Dec 11 '16 at 17:10
  • No, I'm just telling you to write your elements differently. How you store your data in memory is a bit irrelevant. – Austin Jenkins Dec 11 '16 at 17:13
  • You can do something like this in a for loop, `out.write( (char*)&idx[i], sizeof(uint16_t));` – Austin Jenkins Dec 11 '16 at 17:15
  • it worked! what's the syntax to read it in an int? – Daniel Dec 11 '16 at 17:27
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/130347/discussion-between-austin-jenkins-and-daniel). – Austin Jenkins Dec 11 '16 at 17:30