24

I want to read unsigned bytes from a binary file. So I wrote the following code.

#include <iostream>
#include <fstream>
#include <vector>
#include <istream>

std::string filename("file");
size_t bytesAvailable = 128;
size_t toRead = 128;

std::basic_ifstream<unsigned char> inf(filename.c_str(), std::ios_base::in | std::ios_base::binary) ;
if (inF.good())
{
    std::vector<unsigned char> mDataBuffer;
    mDataBuffer.resize(bytesAvailable) ;
    inF.read(&mDataBuffer[0], toRead) ;
    size_t counted = inF.gcount() ;
}

This results in reading in always 0 bytes as shown by the variable counted.

There seem to be references on the web saying that I need to set the locale to make this work. How to do this exactly is not clear to me.

The same code works using the data type 'char' instead of 'unsigned char'

The above code using unsigned char seems to work on Windows but fails running in a colinux Fedora 2.6.22.18 .

What do I need to do to get it to work for linux?

Svante
  • 50,694
  • 11
  • 78
  • 122
David
  • 562
  • 1
  • 3
  • 10
  • Not an answer to the question, but related. Remember that the definition of the string class in C++ is `typedef basic_string string;`, so you can always make an unsigned char string class a la `typedef basic_string bytestring;`. – Ryann Graham Mar 02 '09 at 23:47
  • true, but I want to read a BINARY file – David Mar 03 '09 at 06:29
  • .read() and .write() can be used for binary/text, the stream operators << and >> are for text files only. All data on a computer is ultimately binary, it's how you choose to interpret it. – sfossen Mar 03 '09 at 15:35
  • If you want "binary" use uint8_t... just ignore that it's a typedef alias for unsigned char. – Ryann Graham Mar 03 '09 at 16:35
  • This problem has been solved here: http://stackoverflow.com/q/19205531/331024 It has a full implementation of char_traits and codecvt – x-x Oct 10 '13 at 22:28

4 Answers4

29

C++ does require the implementation only to provide explicit specializations for two versions of character traits:

std::char_traits<char>
std::char_traits<wchar_t>

The streams and strings use those traits to figure out a variety of things, like the EOF value, comparison of a range of characters, widening of a character to an int, and such stuff.

If you instantiate a stream like

std::basic_ifstream<unsigned char>

You have to make sure that there is a corresponding character trait specialization that the stream can use and that this specialization does do useful things. In addition, streams use facets to do actual formatting and reading of numbers. Likewise you have to provide specializations of those too manually. The standard doesn't even require the implementation to have a complete definition of the primary template. So you could aswell get a compile error:

error: specialization std::char_traits could not be instantiated.

I would use ifstream instead (which is a basic_ifstream<char>) and then go and read into a vector<char>. When interpreting the data in the vector, you can still convert them to unsigned char later.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • 13
    I did not get a compiler error, no hints in documentation, nothing, but silent failure and a wasted day. Thank you Bjarne Stroustrup and Dennis Ritchie. – user1358 Jul 26 '13 at 06:15
21

Don't use the basic_ifstream as it requires specializtion.

Using a static buffer:

linux ~ $ cat test_read.cpp
#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {
                unsigned char mDataBuffer[ bytesAvailable ];
                inf.read( (char*)( &mDataBuffer[0] ), bytesAvailable ) ;
                size_t counted = inf.gcount();
                cout << counted << endl;
        }

        return 0;
}
linux ~ $ g++ test_read.cpp
linux ~ $ echo "123456" > file
linux ~ $ ./a.out
7

using a vector:

linux ~ $ cat test_read.cpp

#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;
        size_t toRead = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {

                vector<unsigned char> mDataBuffer;
                mDataBuffer.resize( bytesAvailable ) ;

                inf.read( (char*)( &mDataBuffer[0]), toRead ) ;
                size_t counted = inf.gcount();
                cout << counted << " size=" << mDataBuffer.size() << endl;
                mDataBuffer.resize( counted ) ;
                cout << counted << " size=" << mDataBuffer.size() << endl;

        }

        return 0;
}
linux ~ $ g++ test_read.cpp -Wall -o test_read
linux ~ $ ./test_read
7 size=128
7 size=7

using reserve instead of resize in first call:

linux ~ $ cat test_read.cpp

#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;
        size_t toRead = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {

                vector<unsigned char> mDataBuffer;
                mDataBuffer.reserve( bytesAvailable ) ;

                inf.read( (char*)( &mDataBuffer[0]), toRead ) ;
                size_t counted = inf.gcount();
                cout << counted << " size=" << mDataBuffer.size() << endl;
                mDataBuffer.resize( counted ) ;
                cout << counted << " size=" << mDataBuffer.size() << endl;

        }

        return 0;
}
linux ~ $ g++ test_read.cpp -Wall -o test_read
linux ~ $ ./test_read
7 size=0
7 size=7

As you can see, without the call to .resize( counted ), the size of the vector will be wrong. Please keep that in mind. it is a common to use casting see cppReference

sfossen
  • 4,774
  • 24
  • 18
0

If you're on Windows you can directly use:

using ufstream = std::basic_fstream<unsigned char, std::char_traits<unsigned char>>;
ufstream file;

On Linux no such luck, as unsigned_char facets or locales are not provided, so follow @Johannes approach.

KeyC0de
  • 4,728
  • 8
  • 44
  • 68
-1

A much easier way:

#include <fstream>
#include <vector>

using namespace std;


int main()
{
    vector<unsigned char> bytes;
    ifstream file1("main1.cpp", ios_base::in | ios_base::binary);
    unsigned char ch = file1.get();
    while (file1.good())
    {
        bytes.push_back(ch);
        ch = file1.get();
    }
    size_t size = bytes.size();
    return 0;
}
rlbond
  • 65,341
  • 56
  • 178
  • 228
  • 2
    That is very inefficient. Try running benchmarks with 1GB files, the overhead of the calls will show a big difference. – sfossen Mar 03 '09 at 04:36
  • @david: it makes no difference in the file. 0xFF is 255 if stored in an unsigned char or -1 if stored in the signed char. Hence why the cast is not a bad thing. If this was multi byte the only difference would be if the endianness is different. – sfossen Mar 03 '09 at 15:37
  • @David: endianness is usually only a problem when switch architecture types eg. powerpc vs x86. – sfossen Mar 03 '09 at 15:58