trouble reading binary data

Question

The reader and writer

#include<string>
#include<fstream>
#include<memory>

class BinarySearchFile{

     BinarySearchFile::BinarySearchFile(std::string file_name){

     // concatenate extension to fileName
     file_name += ".dat";

     // form complete table data filename
     data_file_name = file_name;

     // create or reopen table data file for reading and writing
     binary_search_file.open(data_file_name, std::ios::binary);  // create file

     if(!binary_search_file.is_open()){

          binary_search_file.clear();
          binary_search_file.open(data_file_name, std::ios::out | std::ios::binary);
          binary_search_file.close();
          binary_search_file.open(data_file_name), std::ios::out | std::ios::in | std::ios::binary | std::ios::ate;
     }

    std::fstream binary_search_file;

    void BinarySearchFile::writeT(std::string attribute){

        if(binary_search_file){
            binary_search_file.write(reinterpret_cast<char *>(&attribute), attribute.length() * 2);
        }
    }

    std::string BinarySearchFile::readT(long filePointerLocation, long sizeOfData) 
    {
        if(binary_search_file){
           std::string data;
           data.resize(sizeOfData);
           binary_search_file.seekp(filePointerLocation);
           binary_search_file.seekg(filePointerLocation);
           binary_search_file.read(&data[0], sizeOfData);
           return data; 
    }
};

The reader call

while (true){
    std::unique_ptr<BinarySearchFile> data_file(new BinarySearchFile("classroom.dat"));

    std::string attribute_value = data_file->read_data(0, 20);

}

The writer call

    data_file->write_data("packard   ");

The writer writes a total of 50 bytes

"packard   101       500  "

The reader is to read the first 20 bytes and the result is "X packard X" where X represents some malformed bytes of data. Why is the data read back in x-number of bytes corrupt?

A file is a stream of bytes. If you want to write to a file, you need a stream of bytes to write to that file that follows whatever file format you want. Do you have a file format? Do you create a stream of bytes in that format? You're expecting this to work by magic. — David Schwartz, Apr 16 '13 at 15:37
Do you have a file format? Binary! Do you create a stream of bytes in that format? I believe I do but apparently incorrectly. — Mushy, Apr 16 '13 at 15:41
If you have a file format, what is the meaning of the first byte? And where is the code that puts that specific information into the first byte of the data you write to the file? — David Schwartz, Apr 16 '13 at 15:41
@Mushy Binary is _not_ a file format. It's simply a rough indication that the format you're using isn't restricted to printable characters. — James Kanze, Apr 16 '13 at 15:43
Yes, I have a file format that uses char as a two-byte type which would make writing "packard " 20 bytes. I write that 20 bytes using `std::fstream::write()` and subsequently read those 20 bytes using `std::fstream::read()`. — Mushy, Apr 16 '13 at 15:56
@Mushy You need more than that for a string; you also have to specify the length somehow. Or are you using a fixed length? And using 2 bytes per `char` means that you'll need to convert every `char` in the string, since `char` is only one byte. — James Kanze, Apr 16 '13 at 16:00
Sorry, fixed-length file format using two-byte char and `std::fstream::write()` and `std::fstream::read()` — Mushy, Apr 16 '13 at 16:02
@Mushy `char` is one byte. Always, by definition. Are you trying to write UTF-16BE or UTF-16LE? Or some other encoding? And what is the narrow character encoding you use internally: UTF-8, or something else? — James Kanze, Apr 16 '13 at 16:16
Using `char` as a two-byte value. I already know it is one byte in c++ but it is two-bytes in Java and I am converting a Java program to c++ and need a two-byte char .. i.e. `charchar` or `p ` = `p + ' '` — Mushy, Apr 16 '13 at 17:00
@Mushy: So where is the code to create the stream of bytes to write to the file that follows the rule you stated? I don't see you writing any `char`s at all. You actually just try to write a `std::string`, which has no particular byte format. — David Schwartz, Apr 19 '13 at 08:49

score 2 · Accepted Answer · answered Apr 16 '13 at 15:34

2

You can't simply write data out by casting it's address to a char* and hoping to get anything useful. You have to define the binary format you want to use, and implement it. In the case of std::string, this may mean outputing the length in some format, then the actual data. Or in the case where fixed length fields are needed, forcing the string (or a copy of the string) to that length using std::string::resize, then outputting that, using std::string::data() to get your char const*.

Reading will, of course, be similar. You'll read the data into a std::vector<char> (or for fixed length fields, a char[]), and parse it.

answered Apr 16 '13 at 15:34

James Kanze

150,581
18
184
329

Yes, thank you. I modified the writer as follows: `attribute.resize(attribute.length() * 2);` `const char *write_this = attribute.data();` `binary_search_file.write(write_this, attribute.length());` and the reader as follows: `char data[20];` `binary_search_file.read(data, sizeOfData);` and I get what I desire but need to trim it so the actual data is correct – Mushy Apr 16 '13 at 15:57
@Mushy That should almost work. I don't think that the `resize` of length * 2 does what you seem to want, however; it just adds `attribute.length()` bytes with '\0' to the end of the string. Why do you want 2 bytes for each character, and what does the second byte represent. If you want UTF-16, and the input string is UTF-8, you'll need explicit transcoding, and the final length will depend on the contents of your string. (And of course, everyone else does the opposite: UTF-16 or UTF-32 internally, and UTF-8 in files and on the network.) – James Kanze Apr 16 '13 at 16:13
I want a two-byte char because I am converting a Java program where two-byte char is used to c++ where char is one byte. To maintain ordered format in the conversion and make verification easier, I am choosing to use a two-byte char. If I am not representing a two-byte char properly, open to doing it correctly through transcoding or conversion if necessary to maintain my desired format. – Mushy Apr 16 '13 at 17:04
@Mushy OK. Java's external format is UTF-16BE. _If_ your encoding is ISO 8859-1, or pure ASCII, then you can simply set the top byte to 0; otherwise, you'll have to use a more classical technique for transcoding. There are many ways of doing this, but the simplest would be to create an `std::vector`, then loop over the input, inserting first '0', then the character into the vector, and finally writing `v.data()` (if you have C++11) or `&v[0]` to the output. (Or you can write to the output directly: `dest.put()` for each byte.) – James Kanze Apr 16 '13 at 17:49

alexrider · Answer 2 · 2013-04-16T15:37:06.817

0

binary_search_file.write(reinterpret_cast<char *>(&attribute), attribute.length() * 2);
It is incorrect to cast std::string to char* if you need char* you must use attribute.c_str().
std::string apart from string pointer contains other data members, for example, allocator, your code will write all of that data to file. Also I don't see any reason to multiply string length by 2. +1 makes sense if you want to output terminating zero.

edited Apr 16 '13 at 15:37

answered Apr 16 '13 at 15:32

alexrider

4,449
1
17
27

Any time you need a `reinterpret_cast`, unless you're doing really low level work (e.g. like implementing `malloc`), you should be suspicious. – James Kanze Apr 16 '13 at 15:34
@JamesKanze in case of c_str() there will be no need in reinterpret cast, since there will be char* on hand. Or did I miss something? – alexrider Apr 16 '13 at 15:39
The case of `c_str()` is a case where it is broken without needing a `reinterpret_cast`:-). You need some way in the file to recover the length. – James Kanze Apr 16 '13 at 15:44
@JamesKanze Won't terminal zero be enough? – alexrider Apr 16 '13 at 15:50
It might, if you actually write it. It depends on the format, and how you read it. – James Kanze Apr 16 '13 at 15:59
So this thread subsequently lends itself to the question: When is a reinterpret cast necessary? When converting different types? Can you offer some advice here please. – Mushy Apr 16 '13 at 16:09
@Mushy there is already good explanation on that matter on SO http://stackoverflow.com/questions/28002/regular-cast-vs-static-cast-vs-dynamic-cast – alexrider Apr 16 '13 at 16:12
@Mushy When is a `reinterpret_cast` necessary? Almost never. I've used them when implementing `malloc` and `free`, and in OS internals. I can imagine other system level software that might need them. But not much else. – James Kanze Apr 16 '13 at 16:19
@Mushy (Actually, I do use it in one place in the application I currently work on. The profiler made me do it, or rather, in this case, severe memory constraints, which resulted in my implementing a `union` of `double` and some smaller types by using `NaN` values for the smaller types. But it's not something you want to do unless forced to.) – James Kanze Apr 16 '13 at 16:21
@Mushy Another common usage is when you have a library that uses void* to pass data from one part of your code to another, in most cases it is callbacks. – alexrider Apr 16 '13 at 16:23
@alexrider For to and from `void*`, you normally use `static_cast`. – James Kanze Apr 16 '13 at 16:48

trouble reading binary data

2 Answers2