2

I'm attempting to import a large amount of data from a file into a boost::dynamic_bitset. To accomplish this, I was hoping to use an istream_iterator which matches the block size of the dynamic_bitset (uint32_t).

As shown below, I setup my ifstream using the location of the file to be imported. However, once I initialize the istream_iterator with the ifstream, the ifstream's fail bit is set.

Any advice regarding why this is occurring?

ifstream memHashes (hashFileLocation, ios::in | ios::binary);
if(memHashes.is_open() == false || memHashes.good() == false) { break; }
std::istream_iterator<uint32_t> memHashesIt(memHashes);
std::istream_iterator<uint32_t> memHashesEOFIt;

According to cplusplus.com:

failbit is generally set by an input operation when the error was related to the internal logic of the operation itself, so other operations on the stream may be possible. While badbit is generally set when the error involves the loss of integrity of the stream, which is likely to persist even if a different operation is performed on the stream. badbit can be checked independently by calling member function bad.

Edit:

The hash contains 160 bit hashes, produced by a SHA1 implementation in a separate C application. There are a few thousand hashes in this file. I would like to read 5 blocks of 4 bytes, instead of 20 blocks of 1 byte (hence my use of uint32_t as the block size) I've pulled in the relevant code from the C application, which shows the hashes being produced and then written to a file:

#define HASH_SIZE 20 // 160 bits / 8 bits per byte = 20 bytes

FILE *fp;
fp = fopen(hash_filename, "wb");
if (!fp) {
    MSG("Hash dump file cannot be opened");
    fclose(fp);
    return NULL;
}

uint8_t *p;
unsigned char hash[HASH_SIZE];
SHA1((unsigned char*)p, LENGTH_TO_HASH, hash);
fwrite(hash, HASH_SIZE, 1, fp);
BSchlinker
  • 3,401
  • 11
  • 51
  • 82
  • I thought all fstream wrre char-based – Lightness Races in Orbit Jan 26 '13 at 22:09
  • @Non-StopTimeTravel Changing uint32_t into uint8_t makes the error go away. Disappointing as reading in blocks of 4 bytes would perhaps be more efficient then reading in blocks of 1 byte. Not sure I understand why this is impossible -- shouldn't istream_iterator simply read 4 bytes at a time from ifstream? – BSchlinker Jan 26 '13 at 22:20
  • 1
    The code should work unless the data cannot be read as `uint32_t`, what does the file contain? – Jesse Good Jan 26 '13 at 22:42
  • @JesseGood I've added details on the producer of the file. The producer is writing multiple chars / uint8_ts to a file. I was hoping to read in 4 uint8_t blocks as a single uint32_t block. – BSchlinker Jan 26 '13 at 23:02
  • @Bschlinker: by that logic a vector::iterator should be able to iterate over a vectoe, which makes no sense. Iterators match container element type so by the same underlying reasoning they should match stream unit type too – Lightness Races in Orbit Jan 26 '13 at 23:45
  • Your input stream is not numbers. So using unint32_t will fail. Open the hash file with a text editor and have a look. You will see all sorts of numbers and characters in there. – Martin York Jan 27 '13 at 00:14
  • @LokiAstari `uint32_t` can be used to represent 32 bits of binary data, just the same as `unsigned char` or `uint8_t` can both be used to represent a single byte of binary data.... infact I recall seeing a library which preferred using `uint8_t` to represent binary data recently. – BSchlinker Jan 27 '13 at 00:34
  • @Non-StopTimeTravel I had assumed that the iterator would read in 4 bytes, shift those 4 bytes into an `uint32_t` structure, and then return that structure to me. But it's clear to me that my assumption was wrong.. – BSchlinker Jan 27 '13 at 00:41
  • @BSchlinker: **Open the file in the editor and look at the data**. – Martin York Jan 27 '13 at 02:37
  • @LokiAstari The file is binary data. I get that -- you can see it in the code that I wrote that I get that. You *can also* store binary data in a `uint32_t` -- is this the point which is in dispute? – BSchlinker Jan 27 '13 at 02:45
  • @BSchlinker:Exactly (but you can still look at it in an editor (which will show you the problem)). The `operator>>` is used to read a human readable text value (ie it translates numbers like 256 into a single integer). The istream_iterator uses `operator>>` internally. What you really want to do is define a class that represents the sha. Then define `operator>>` for your class so it reads 160 bits directly in one read directly into the object. – Martin York Jan 27 '13 at 02:54
  • @LokiAstari That's true, I actually changed `istream_iterator` to `istreambuf_iterator` a few hours ago, which will only handle chars as the type, after I read http://stackoverflow.com/questions/10564013/c-streams-confusion-istreambuf-iterator-vs-istream-iterator. I've been debating what to do with this question since... – BSchlinker Jan 27 '13 at 03:01
  • @BSchlinker: You are missing the point. The istream_iterator and istreambuf_iterator read using `operator>>`. The `operator>>` reads human readable text (not binary data) (for the standard types). This is why it fails for uint32_t (it is expecting human text not binary data). – Martin York Jan 27 '13 at 03:04

3 Answers3

2

The std::istream_iterator<T> use the input operator>>() for objects of type T. That is, it assumes formatted input. Upon construction it tries to read the first element which may cause the std::istream to get std::ios_base::failbit set.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
0

I think the initialization will read a uint32_t from the stream. The type uint32_t is an alias for either unsigned or or unsigned long. I have the creeping feeling that your file doesn't contain numbers but that you expect (see e.g. the ios_base::binary openmode) some packed, non-text representation to be read by the stream. If this is the case, your expectation is simply wrong, but it's hard to tell without knowing more about your program. One note though: If you're reading the istream_iterator to the end, you will always have both eofbit and failbit set. I guess you only have failbit set, which suggests a parsing error.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
  • I've added additional information showing how the file is being produced. It's true -- it's a binary stream and not an ASCII stream (I'm guessing this is what you mean by `packed`?) Why would that prevent the iterator from working? – BSchlinker Jan 26 '13 at 23:03
  • The iterator just does `in >> var` internally. The reason this fails is that it will extract text that is then parsed as number. The binary flag doesn't change that, read the docs on it. You need to use read() to retrieve single bytes. Note that read is an "unformatted input function", which is what Dietmar hints at above. – Ulrich Eckhardt Jan 27 '13 at 15:52
0

The problem is you have binary data.

The istream_iterator and istreambuf_iterator use operator>> to read data. For uint_32_t this means it will read human readable text and convert it into an integer. This will fail (most of the time) for binary data.

You have another misconception about speed.
Reading 4 bytes at a time is unlikely to be any faster than reading 1 bytes at a time (it will make the code more complex which may slow it down but there will be no difference in reading speed). This is because reading from the stream is buffered. A huge chunk has already been read into a buffer when you do a read it is simply copying it from one location to another.

What you really want to do is define a class an copy the data as a single unit into your class:

class ShaMine
{
    std::vector<char>  data;
    public:
        ShaMine(): data(20, '\0') {}

        friend std::istream& operator>>(std::istream& s, ShaMine& dst)
        {
            return s.read(&data[0], 20);
        }

        void poop(std::ostream& s)
        {
             s << "Hi there: Char 0 is :" << (int) data[0] << "\n";
        } 
};

int main()
{
     std::ifstream   sfile("FILE");

     for(std::istream_iterator<ShaMine> loop(sfile); loop != std::istream_iterator<ShaMine>(); ++lop)
     {
         loop->poop(std::cout);
     }
};
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • http://en.cppreference.com/w/cpp/iterator/istream_iterator implies that data is read every time the iterator is incremented, which seems correct via my profiling of the code. *The actual read operation is performed when the iterator is incremented* – BSchlinker Jan 27 '13 at 03:23
  • Of course, reading only implies reading from the underlying object, which may have buffered the file I/O. With that said, incrementing the iterator has a significant performance cost associated with it in my profiling. – BSchlinker Jan 27 '13 at 03:30
  • @BSchlinker: Yes. But working code is 100% more effecient than code does not work. Worry about how to express your code first then worry about performance. Also since C++11 and move operators the cost of the copy out of the iterator is insignificant (as it will be moved) which is another way to show that you should not be worrying about insignificant enhancements at this point. – Martin York Jan 27 '13 at 03:44