0

I have found myself writing code which looks like this

// Treat the following as pseudocode - just an example

iofile.seekg(0, std::ios::end); // iofile is a file opened for read/write
uint64_t f_len = iofile.tellg();

if(f_len >= some_min_length)
{
    // Focus on the following code here
    char *buf = new char[7];
    char buf2[]{"MYFILET"}; // just some random string
                            // if we see this it's a good indication
                            // the rest of the file will be in the
                            // expected format (unlikely to see this
                            // sequence in a "random file", but don't
                            // worry too much about this)
    iofile.read(buf, 7);

    if(memcmp(buf, buf2, 7) == 0) // I am confident this works
    {
        // carry on processing file ...
        // ...
        // ...
    }
}
else
    cout << "invalid file format" << endl;

This code is probably an okay sketch of what we might want to do when opening a file, which has some specified format (which I've dictated). We do some initial check to make sure the string "MYFILET" is at the start of the file - because I've decided all my files for the job I'm doing are going to start with this sequence of characters.

I think this code would be better if we didn't have to play around with "c-style" character arrays, but used strings everywhere instead. This would be advantageous because we could do things like if(buf == buf2) if buf and buf2 where std::strings.

A possible alternative could be,

// Focus on the following code here
std::string buf;
std::string buf2("MYFILET"); // very nice
buf.resize(7); // okay, but not great
iofile.read(buf.data(), 7); // pretty awful - error prone if wrong length argument given
                            // also we have to resize buf to 7 in the previous step
                            // lots of potential for mistakes here,
                            // and the length was used twice which is never good
if(buf == buf2) then do something

What are the problems with this?

  • We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".
  • We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort. I don't personally mind this too much, but others may prefer a more memory-safe solution, perhaps hinting we should use an iterator of some sort? I think in Visual Studio (for Windows users which I am not) then this may return an iterator anyway, which will give [?] warnings/errors [?] - not sure on this.
  • We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.
FreelanceConsultant
  • 13,167
  • 27
  • 115
  • 225

5 Answers5

2

It is undefined behavior to write into the const char* returned by std::string::data(). However, you are free to use std::vector::data() in this way.

If you want to use std::string, and dislike setting the size yourself, you may consider whether you can use std::getline(). This is the free function, not std::istream::getline(). The std::string version will read up to a specified delimiter, so if you have a text format you can tell it to read until '\0' or some other character which will never occur, and it will automatically resize the given string to hold the contents.

If your file is binary in nature, rather than text, I think most people would find std::vector<char> to be a more natural fit than std::string anyway.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
1

We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".

The second time you can use buf.size()

iofile.read(buf.data(), buf.size());

We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort.

And pointed by John Zwinck, .data() return a pointer to const.

I suppose you could define buf as std::vector<char>; for vector (if I'm not wrong) .data() return a pointer to char (in this case), not to const char.

size() and resize() are working in the same way.

We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.

I don't think read() permit this.

p.s.: sorry for my bad English.

max66
  • 65,235
  • 10
  • 71
  • 111
0
template<class Src>
std::string read_string( Src& src, std::size_t count){
  std::string buf;
  buf.resize(count);
  src.read(&buf.front(), 7); // in C++17 make it buf.data()
  return buf;
}

Now auto read = read_string( iofile, 7 ); is clean at point of use.

buf2 is a bad plan. I'd do:

if(read=="MYFILET")

directly, or use a const char myfile_magic[] = "MYFILET";.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
0

We can validate a signature without double buffering (rdbuf and a string) and allocating from the heap...

// terminating null not included
constexpr char sig[] = { 'M', 'Y', 'F', 'I', 'L', 'E', 'T' };     
auto ok = all_of(begin(sig), end(sig), [&fs](char c) { return fs.get() == (int)c; });
if (ok) {}
David Thomas
  • 778
  • 5
  • 10
0

I liked many of the ideas from the examples above, however I wasn't completely satisfied that there was an answer which would produce undefined-behaviour-free code for C++11 and C++17. I currently write most of my code in C++11 - because I don't anticipate using it on a machine in the future which doesn't have a C++11 compiler.

If one doesn't, then I add a new compiler or change machines.

However it does seem to me to be a bad idea to write code which I know may not work under C++17... That's just my personal opinion. I don't anticipate using this code again, but I don't want to create a potential problem for myself in the future.

Therefore I have come up with the following code. I hope other users will give feedback to help improve this. (For example there is no error checking yet.)

std::string
fstream_read_string(std::fstream& src, std::size_t n)
{
    char *const buffer = new char[n + 1];
    src.read(buffer, n);
    buffer[n] = '\0';
    std::string ret(buffer);
    delete [] buffer;
    return ret;
}

This seems like a basic, probably fool-proof method... It's a shame there seems to be no way to get std::string to use the same memory as allocated by the call to new.

Note we had to add an extra trailing null character in the C-style string, which is sliced off in the C++-style std::string.

FreelanceConsultant
  • 13,167
  • 27
  • 115
  • 225