1

I came across Reading the binary file into the vector of unsigned chars and tested the code discussed in the question.

The code of interest is below:

typedef unsigned char BYTE;

std::ifstream file(filename, std::ios::binary);
file.unsetf(std::ios::skipws);

std::vector<BYTE> vec;
vec.insert(vec.begin(),
           std::istream_iterator<BYTE>(file),
           std::istream_iterator<BYTE>());

According to Benjamin Lindley's answer at Why std::istream_iterator ignores newline characters?, istream::operator>>(char) skips white spaces. But the type above is unsigned char, and the file was opened with std::binary.

Why did the code require an explicit call to file.unsetf(std::ios::skipws) (or alternately, file >> std::noskipws)?

jww
  • 97,681
  • 90
  • 411
  • 885
  • 1
    `ios::binary` just makes it not convert newline characters. Also note that `char` may be the same type as `unsigned char`. – user3286380 Feb 15 '14 at 21:05
  • 1
    The `std::ios_base::binary` openmode doesn't disable the "formatting" functionality of `operator>>()`. It only suppresses system-specific conversions. – David G Feb 15 '14 at 21:07
  • 0x499602D2 and user3286380 - so `std::binary` on a stream still has the notion of a `line` (as in `getline` and `readline`)? – jww Feb 15 '14 at 21:10
  • You may want to [`std::istreambuf_iterator`](http://en.cppreference.com/w/cpp/iterator/istreambuf_iterator) instead of `std::istream_iterator` if you're trying to do what I think you are. And your performance will likely be better in the process of doing so. – WhozCraig Feb 15 '14 at 21:11
  • Thanks WhozCraig. I'm not sure what the difference is (or why an octet even has a character trait). What I really wanted was `std::begin(file)` and `std::end(file)`. But that would have been too easy. – jww Feb 15 '14 at 21:13
  • 1
    may be this should help http://stackoverflow.com/questions/7253864/skipws-flag-set-when-opening-an-input-file-stream-in-binary-mode – Aseem Goyal Feb 15 '14 at 21:13
  • @noloader It works because `std::istreambuf_iterator` works directly on the buffer, which performs *unformatted* extraction (meaning whitespace is also extracted). – David G Feb 15 '14 at 21:14
  • You could also just do `vec.assign(std::istreambuf_iterator(file), std::istreambuf_iterator());`. – David G Feb 15 '14 at 21:16
  • Thanks 0x499602D2. `std::istreambuf_iterator` is the wrong type. It would probably make the veins in Bjourne's neck bulge. – jww Feb 15 '14 at 21:17
  • Not sure what you mean. How is it the wrong *type*? – David G Feb 15 '14 at 21:20
  • 0x499602D2 - I'm operating on `unsigned chars` or octets. It's binary data in the customary sense (not the C++ sense). I'm not operating on `chars` with traits like white space, new line, etc. They are different types. – jww Feb 15 '14 at 21:46
  • @0x499602D2 - when I check the outtput of `cpp -dM < /dev/null | grep char`, I see `char` is signed: `#define __INT8_TYPE__ char`. I'm fairly certain these are different types on this system (Mac OS X 10.8.3). – jww Feb 15 '14 at 23:17

1 Answers1

2

The basic algorithm for >> of a string is:

1) skip whitespace
2) read and extract until next whitespace

If you use noskipws, then the first step is skipped.

After the first read, you are positionned on a whitespace, so the next (and all following) reads will stop immediatly, extracting nothing.

Aseem Goyal
  • 2,683
  • 3
  • 31
  • 48
  • Thanks anon. There is no `string` (or its primitive, the `char`). Perhaps I'm using the wrong classes for this exercise. – jww Feb 15 '14 at 21:47