3

hopefully someone here can help. My problem is as follows:

I am creating files which contain binary data. At the start of each file is a binary header which has information regarding the contents of the file. The file header is a fixed size, 52 bytes. The header has specific pieces of information at specific byte offsets within the header, however some pieces of information only cover portions of a byte, say 3 bits.

For example:

Byte 1-4 = file length

Byte 5-8 = header length

Byte 8-9 = version info

Byte 10-13 = file creation timestamp

bit 1-4 = Month (1-12)

bit 5-9 = Day (1-31)

bit 10-14 = Hour (0-23)

bit 15-20 = Minute (0-59)

bit 21 = UTC offset direction

bit 22-26 = UTC offset hour

bit 27-32 = UTC offset minute

etc...

Some of the values are defined statically, some are determined at runtime. What I've attempted to do is create a 'map' of the header, defining the number of bits an attribute must consume, and the value represented by the bits. These are stored in a vector of int pairs, int_pair.first being the value, and int_pair.second the number of bits. I then convert the supplied values (all integers) to binary format and insert the binary notation into a stringstream. I then create a bitset from the string representation of the binary value, and write that to a file. My problem is that the bytes are not showing up in the output file in the proper order.

I'll omit the method for obtaining the values and just supply integers in my example, and I'll truncate some of the info in the header for brevity (so in this example the header is 14 bytes, not 52), but here is roughly what I'm doing:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <bitset>
#include <vector>
#include <algorithm>

int main ()
{
    vector<pair<int,int>> header_vec;

    header_vec.push_back(make_pair(9882719,32)); // file length
    header_vec.push_back(make_pair(52,32)); // header length
    header_vec.push_back(make_pair(6,3)); // high release identifier
    header_vec.push_back(make_pair(4,5)); // high version identifier
    header_vec.push_back(make_pair(6,3)); // low release identifier
    header_vec.push_back(make_pair(4,5)); // low version identifier

    // file open timestamp
    header_vec.push_back(make_pair(9,4));  // month
    header_vec.push_back(make_pair(6,5));  // day
    header_vec.push_back(make_pair(19,5)); // hour
    header_vec.push_back(make_pair(47,6)); // min
    header_vec.push_back(make_pair(0,1));  // utc direction
    header_vec.push_back(make_pair(0,5));  // utc offset hours
    header_vec.push_back(make_pair(0,6));  // utc offset minutes

    ostringstream oss;

    // convert each integer to binary representation
    for ( auto i : header_vec )
    {
        for (unsigned int j(i.second-1); j != -1; --j)
        {
            oss << ((i.first &(1 << j)) ? 1 : 0);
        }
    }

    // copy oss
    string str = oss.str();

    // create bitset
    bitset<112> header_bits(string(str.c_str()));

    // write bitset to file
    ofstream output("header.out", ios::out | ios::binary );
    output.write( reinterpret_cast<char *>(&header_bits), 14);
    output.close();

    return 0;

}

Now, for the most part this method seems to work, except that the bits are reversed. If I look at the output file in fm, I expect to see this:

File: header.out    (0x0e bytes)
Byte: 0x0

00    00 96 cc 5f 00 00 00 34 c4 c4 93 4e f0 00           ..._...4...N...O

      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f      0123456789abcdef

When in fact I see this:

File: header.out    (0x0e bytes)
Byte: 0x0

00    00 f0 4e 93 c4 c4 34 00 00 00 5f cc 96 00           @O...N...4..._..

      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f      0123456789abcdef

I tried to reverse str prior to creating the bitset, but that does not yield the desired output either.

I suppose I dont understand enough about the bitset to realize why this is happening. Any and all input is greatly appreciated! Also, if there is a different method for accomplishing this please share!

Thanks in advance... -J

JakobWithAK
  • 31
  • 1
  • 2
  • Does vector have push_front? I get compile error test.cpp: In function 'int main()': test.cpp:65:16: error: 'class std::vector>' has no member named 'push_front' – JakobWithAK Sep 07 '12 at 04:41
  • ah, sadly, no. sorry - but other collection classes do. – Alex Brown Sep 07 '12 at 04:45
  • 1
    Why are you "converting to binary representation"? The variables and constants are stored in binary anyway. For the most part you should just be performing `ofstream.write` calls. The only somewhat tricky one is the date/timestamp but that could be done with some shift operators. And isn't `reinterpret_cast` somewhat unsafe code? – nicholas Sep 07 '12 at 04:46
  • Take a look at the answer to this question: http://stackoverflow.com/questions/778378/how-to-write-bitset-data-to-a-file – Vaughn Cato Sep 07 '12 at 04:51

2 Answers2

1

Writing the bitset<> directly as a memory dump is surely non-portable, as shown by the need for the reinterpret_cast<>. In other words, even if the data is laid out in a nice block, you do not know how that is done.

If I were you, I'd write a dumber function to extract chunks of 8-bits from the bitset and write them as bytes to the file using the normal access operator [].

As for another approach, what I usualy do when I want to read/ write a binary file is define a structure or set of structures that map directly to the layout of the file.

For example:

struct Timestamp
{
    int month:4;
    int day:5;
    int hour:5;
    int minute:6;
    int utcOffsetDirection:1;
    int utcOffsetHour:5;
    int utcOffsetMinute:5;

};
Keith
  • 6,756
  • 19
  • 23
  • I completely concur. Rule the endian and packing Gods and take control by managing your bytes one at a time. In the long run you will NOT regret it. – WhozCraig Sep 07 '12 at 04:57
  • Keith and Andre, this approach makes sense to me. I dont have time tonight but I will attempt this tomorrow. Thanks – JakobWithAK Sep 07 '12 at 05:01
  • and don't forget when it comes time to do the actual writing, simply perform `output.write((char*) &myTimeStamp, 4);` - caveat to not use the `sizeof` operator due to padding issues – nicholas Sep 07 '12 at 05:10
1

Why you don't you just use a struct bitfield, so you just read to and write to the struct, without worrying to do "bit parsing". Just be careful about memory alignment. Ensure you add some padding to fit the word bondaries

struct timestamp{
       unsigned mont:4;
       unsigned day:5;
       unsigned hour:5;
       unsigned minute:6;
       unsigned utc:1;
       unsigned utc_hour:5;
       unsigned utc_min:6   
};


struct header{
   int32_t file_length;
   int32_t header_lenght;
   int16_t version;
   timestamp tmsp;
};
André Oriani
  • 3,553
  • 22
  • 29
  • Quick fix for your code: everything but the Time Stamp was byte sizes, the only bitfield is the Timestamp parameters. Also, the order should be reversed (in MS platforms anyway) – nicholas Sep 07 '12 at 05:04
  • @nicholas Thanks, I went to0 eager to answer the question when I realized that a bitfield would do the dirty job. I didn't pay attention to that. Understood everything was bits. – André Oriani Sep 07 '12 at 05:08
  • @nicholas you said "Also, the order should be reversed (in MS platforms anyway)". Are you talking about endianness? – André Oriani Sep 07 '12 at 05:15
  • Well in the [Microsoft documentation](http://msdn.microsoft.com/en-us/library/ewwyfdbe(v=vs.71).aspx) made me a bit confused. Now I am thinking in this case the order *does not* need to be reversed. (It was quite late when we were working on this) - as far as endianness goes: my guess is that it does not have an effect: `ofstream.write` is parsing out byte by byte (`char*`) and endianness is a concern when writing multi-byte objects.. only a test could show - I have not been a c++ programmer for years, C# these days – nicholas Sep 07 '12 at 12:14