0

I'm trying to read the characters in a given file and output the number of hex characters. When i run this against a text file it's more or less accurate but with just about anything else it seems to be WAY off. IE: a *.mp4 file that's ~700MB will come up at 12K. what am i missing here?

#include <fstream>
#include <iostream>
using namespace std ;

int main()
    {
    char letter ;
    int i ;
    cout << "Input the filename:" << endl;
    string stringinput;
        cin >> stringinput;
    ifstream file( stringinput.c_str() ) ;
    if( ! file )
    {
        cout << "Error opening input file, " << ( stringinput ) << ". Check file path and try again." << endl ;
        return -1 ;
    }
    else
        for( i = 0; ! file.eof() ; i++ )
        {
            file.get( letter ) ;
            //cout << hex << (int) letter;
        }
        cout << endl;
        float k = 1024, m = 1048576;
        file.close();
        if( i < 1024)
        {
            cout << "Total: " << dec << i << endl;
        }
        else if( i < m)
        {
            cout << "Total: " << dec << (i / k) << "K" << endl;
        }
        else
        {
            cout << "Total: " << dec << (i / m) << "M" << endl;
        }
        return 0 ;
}
J'e
  • 3,014
  • 4
  • 31
  • 55

2 Answers2

2

You need to open the file in binary mode.

ifstream file( stringinput.c_str() ) ;

should be:

ifstream file( stringinput.c_str(), ios_base::in | ios_base::binary ) ;

Reading a file in text-mode will mean that certain characters (such as CTRL-Z) are treated as "end of file", and thus your reading will end prematurely if that happens to be part of your input. Since mp4 files are binary files with pretty "random" content, these characters can not be guaranteed to not occur in the file.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
2

An .mp4 file needs to be opened in binary mode:

std::ifstream file(stringinput.c_str(), std::ios_base::in | std::ios::binary) ;

If you don't specify the std::ios::binary flag, the file is open in text mode by default. The problem with doing this is that the contents of the file is interpreted as if it were text (i.e. the runtime assumes that all bytes fall within a valid range and have a certain meaning defined by your system locale's character set, such as ASCII or UTF-8). Therefore, it will do things like convert new-line characters from DOS to UNIX format (or vice-versa), or treat certain control characters in a special way, etc.

See Difference between files writen in binary and text mode for more information.

Community
  • 1
  • 1
Charles Salvia
  • 52,325
  • 13
  • 128
  • 140