10

I'm using Linux and C++. I have a binary file with a size of 210732 bytes, but the size reported with seekg/tellg is 210728.

I get the following information from ls-la, i.e., 210732 bytes:

-rw-rw-r-- 1 pjs pjs 210732 Feb 17 10:25 output.osr

And with the following code snippet, I get 210728:

std::ifstream handle;
handle.open("output.osr", std::ios::binary | std::ios::in);
handle.seekg(0, std::ios::end);
std::cout << "file size:" << static_cast<unsigned int>(handle.tellg()) << std::endl;

So my code is off by 4 bytes. I have confirmed that the size of the file is correct with a hex editor. So why am I not getting the correct size?

My answer: I think the problem was caused by having multiple open fstreams to the file. At least that seems to have sorted it out for me. Thanks to everyone who helped.

PSJ
  • 103
  • 1
  • 7
  • Is this the case across all file systems (in case you have several)? – hlovdal Feb 17 '10 at 16:51
  • Unfortunately, I don't have the option to test on a different file system. – PSJ Feb 17 '10 at 16:52
  • Works well on my 32-bit Ubuntu system. Do you use g++? – tur1ng Feb 17 '10 at 16:53
  • I'm using g++ 4.1.2 and the system is 64-bit CentOS. – PSJ Feb 17 '10 at 16:58
  • I have just tried it on a 32-bit ArchLinux system with g++ 4.4.2 and it also gives the wrong answer. – PSJ Feb 17 '10 at 17:12
  • Yes, it seems to be just that one file. When I check the sizes of other files, then there does not seem to be a problem. – PSJ Feb 17 '10 at 17:50
  • if(name == "output.osr) len += 4; :-) – pm100 Feb 17 '10 at 18:02
  • Maybe this is related to static_cast it seems that static_cast isn't that "safe" to use! This article http://msdn.microsoft.com/en-us/library/c36yw7x9%28VS.80%29.aspx explains pretty well the problem behind static_cast. You could give dynamic_cast a try and see how it is affecting the results. –  Feb 17 '10 at 18:22
  • @Layne: Thanks for the suggestion. That is interesting. I think I got the problem solved - I think it was caused by having multiple open fstreams. – PSJ Feb 17 '10 at 19:32

4 Answers4

9

Why are you opening the file and checking the size? The easiest way is to do it something like this:

#include <sys/types.h>
#include <sys/stat.h>

off_t getFilesize(const char *path){
   struct stat fStat;
   if (!stat(path, &fStat)) return fStat.st_size;
   else perror("file Stat failed");
}

Edit: Thanks PSJ for pointing out a minor typo glitch... :)

t0mm13b
  • 34,087
  • 8
  • 78
  • 110
  • 1
    Probably because it doesn't answer the question –  Feb 17 '10 at 17:03
  • @Neil: Oh...He talked about opening the file and seeking to the end in order to get the size and it returned incorrect results...I was wondering why not use this function instead in having to open/close the file...? – t0mm13b Feb 17 '10 at 17:05
  • Thanks, I'm opening the file to parse it, so I can check that it contains the correct data. I have tried the above, but it also gives the wrong answer. Also with a const char* argument shouldn't it be stat instead of fstat? – PSJ Feb 17 '10 at 17:08
3

At least for me with G++ 4.1 and 4.4 on 64-bit CentOS 5, the code below works as expected, i.e. the length the program prints out is the same as that returned by the stat() call.


#include <iostream>
#include <fstream>
using namespace std;

int main () {
  int length;

  ifstream is;
  is.open ("test.txt", ios::binary | std::ios::in);

  // get length of file:
  is.seekg (0, ios::end);
  length = is.tellg();
  is.seekg (0, ios::beg);

  cout << "Length: " << length << "\nThe following should be zero: " 
       << is.tellg() << "\n";

  return 0;
}
janneb
  • 36,249
  • 2
  • 81
  • 97
  • Thank you. Surprisingly, this actually gives me the correct answer. I don't understand why, but it does provide me with the result I'm looking for. – PSJ Feb 17 '10 at 18:04
  • but thats exactly the same code- apart from the static cast to unsigned int – pm100 Feb 17 '10 at 18:10
  • Yeah, I must have something somewhere, that is interfering. I'm trying to figure it out. – PSJ Feb 17 '10 at 18:13
  • @pm100: Yes. I mainly wanted to verify that the libstdc++ for g++ 4.1 and 4.4 in centos 5 x86_64 does not contain such a glaring bug. Rather, there is something fishy with the OP's system. – janneb Feb 17 '10 at 18:20
  • 1
    I think it must have been caused by having multiple fstreams open to the file. – PSJ Feb 17 '10 at 19:25
  • @PSJ I had the same problem, I solved the problem by closing an another file handle to the file. Thanks. – Laurence Jan 18 '14 at 10:53
2

When on a flavour of Unix, why do we use that, when we have the stat utlilty

long findSize( const char *filename )
{
   struct stat statbuf;
   if ( stat( filename, &statbuf ) == 0 )
   {
      return statbuf.st_size;
   }
   else
   {
      return 0;
   }
}

if not,

long findSize( const char *filename )
{
   long l,m; 
   ifstream file (filename, ios::in|ios::binary ); 
   l = file.tellg(); 
   file.seekg ( 0, ios::end ); 
   m = file.tellg(); 
   file.close(); 
   return ( m – l );
}
Narendra N
  • 1,270
  • 1
  • 9
  • 14
1

Is it possible that ls -la is actually reporting the number of bytes the file takes up on the disk, instead of its actual size? That would explain why it is slightly higher.

Frederik Slijkerman
  • 6,471
  • 28
  • 39
  • That was my thought too. I'm generating the file myself and I'm putting 210732 bytes into the file, also when I inspect the file with ghex2 it actually contains all the bytes. – PSJ Feb 17 '10 at 18:14