-1

I wrote this function, I want that simply reads a binary file and returns a vector with the content of the file. I follow some tutorial about how use "fread" but I have a segmentation violation. Can you help me to understand this?

const std::vector<uint32_t> read() {
    //obtain file size
    fseek( fBinaryFile, 0, SEEK_END );
    long lsize = ftell( fBinaryFile );
    rewind( fBinaryFile );
    uint32_t pDataBuffer[sizeof( uint32_t )*lsize];
    //read file
    fread( pDataBuffer, sizeof( uint32_t ), lsize, fBinaryFile );
    std::vector<uint32_t> cVector( pDataBuffer, pDataBuffer + sizeof( uint32_t )*lsize );
    closeFile();
    return cVector;
}
sbi
  • 219,715
  • 46
  • 258
  • 445
user3050386
  • 139
  • 1
  • 9
  • Can you use the standard library's iostream instead? – KABoissonneault Aug 04 '15 at 15:20
  • 1
    What is fBinaryFile ? BTW `ftell` should give a size in bytes, not in uint32_t – Serge Ballesta Aug 04 '15 at 15:22
  • There's a confusion here: `uint32_t pDataBuffer[sizeof( uint32_t )*lsize];` you don't have to multiply `lsize` by `sizeof(uint32_t)` here, you're actually creating a bigger array than intended. Besides [VLA](https://en.wikipedia.org/wiki/Variable-length_array)s are a C feature not a C++ one, and you're probably allocating to much data on the stack if the file is big. – Caninonos Aug 04 '15 at 15:23
  • 1
    learn how to use a debugger. it will help you solve this and other problems, and will help you better understand how stuff works. – hoijui Aug 04 '15 at 15:32
  • Where does `fseek/ftell` come from to get a file's size in bytes? It's not portable and it's not guaranteed to work. Use `stat()` or `fstat()`. And check the return values from `fseek` and `fread`. – Andrew Henle Aug 04 '15 at 15:39
  • 2
    Why the temporary buffer then the copy? This is so inefficient. Just extract directly from a C++ `fstream` into the vector..... you can do it in two lines of code, _including_ necessary declarations. – Lightness Races in Orbit Aug 04 '15 at 15:43
  • @AndrewHenle sorry about that, but `ftell()` in binary mode is guaranteed to give you the number of bytes from the beginning of the file. So it should be pretty close to the file size;-). `stat()/fstat()` are posix and not standard C: they are less portable than `ftell()`. – Christophe Aug 04 '15 at 15:50
  • This is abysmally bad code. Do you have [a good C++ book](http://stackoverflow.com/a/388282/140719) to learn from? Because you seem to need it urgently. – sbi Aug 04 '15 at 16:18
  • @Christophe - `ftell` only works like that in a POSIX environment, so `stat()` is a prerequisite for `ftell()` working in that manner. Try using `ftell()` to get a file size on an IBM mainframe dataset. See http://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/ftell.htm "ANSI states that the ftell() function returns relative byte offsets from the beginning of the file for binary files. Under z/OS® XL C/C++, this is true except for record-oriented files that have variable length records. For these types of files, the ftell() function returns an encoded offset." – Andrew Henle Aug 04 '15 at 16:48
  • @Christophe - Also, per the C standard (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf, footnote 268), "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." and (7.21.9.2) "A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END." – Andrew Henle Aug 04 '15 at 16:58
  • @AndrewHenle ok for the uncertainty about trailing nulls. State-dependent encoding is for text streams and not binary streams. And what does the standard state about fstat() ? ;-) – Christophe Aug 04 '15 at 17:05
  • @Christophe - If your point is that there is no standard-compliant way to determine the size of a file, I agree. Of course, that's a good reason to use something like `stat()` or `fstat()`, because if they don't exist `fseek()/ftell()` won't work reliably anyway. It's just plain wrong to expect a standard C call such as `fseek()` or `ftell()` to behave in a POSIX manner. At least by using a POSIX call, if the code is ported to a non-POSIX environment it won't compile and there won't be a latent bug in the code. And state-dependent encoding is not restricted to text streams. See above. – Andrew Henle Aug 04 '15 at 17:15

3 Answers3

1

As already said in a comment to the question, you confuse the number of bytes in the file with the number of 32bit integers, resulting in reading garbage into the end of your vector. While this technically leads to undefined behavior, it is unlikely to cause the crash you describe. In fact, I don't see what is causing the crash in the code you show, but given the state of the code, I'll have to assume the code you do not show to contain similar problems.

This is C++. Why do you deal with C's file accessors? Reading a sequence of objects would be much easier with stream iterators.

Why do you first read into a buffer, and copy that into a vector, rather then reading into the vector right away?

I have carefully rewritten parts of your code to address these issues. HTH.

// CAUTION! Brain-compiled code ahead
std::vector<uint32_t> read()
{
    std::fseek( fBinaryFile, 0, SEEK_END );
    const long numBytes = std::ftell( fBinaryFile );
    std::rewind( fBinaryFile );

    std::vector<uint32_t> cVector;
    cVector.resize( numBytes/sizeof(uint32_t) );
    if( !cVector.empty() ) {
        std::fread( &cVector[0], sizeof(uint32_t), lsize, fBinaryFile );
        std::closeFile();
    }

    return cVector;
}

You might have noticed that I removed the const from the function's return type. This is because it is mostly useless.

Community
  • 1
  • 1
sbi
  • 219,715
  • 46
  • 258
  • 445
0

First, variable length arrays are not part of the C++ standard, even if some compilers accept it as an extension.

There's a confusion in the number of elements and sizes that you process: lsize will be the length of the file in bytes, but you allocate an array of uint32_t of sizeof(uint32_t)*lsize elements. That's sizeof(uint32_t)*sizeof(uint32_t) bigger than the data available in the file ! Keep in mind that this size must be smaller than SIZE_MAX.

I suspect that there are some memory overflow problems. Therefore I'd recommed you to use standard constructs:

uint32_t *pDataBuffer = new uint32_t[sizeof( uint32_t )*lsize]; 

and take care of the possibility that there's not enough memory.

By the way, you try to read lsize uint32_t. So you read much more than currently available.

P.S: maybe you could consider calculating the size with ftell( fBinaryFile ) / sizeof(uint32_t) ? And also make sure that the size is not 0.

Christophe
  • 68,716
  • 7
  • 72
  • 138
  • Small typo: you wrote "sizoef" several times. – Caninonos Aug 04 '15 at 15:35
  • A dynamically allocated buffer referred to by naked pointer, shown to someone who is obviously a novice. Ugh. – sbi Aug 04 '15 at 16:14
  • @sbi well it's more portable than VLE, and given the rather C style of the rest of the code, it seemed perfectly appropriate. – Christophe Aug 04 '15 at 16:18
  • @Christophe: More portable it is indeed. But it's bad nonetheless. (And please explain your downvote. SO is not a pissing contest. Revenge downvotes are frowned upon, because they do not improve the site.) – sbi Aug 04 '15 at 16:20
  • @sbi "bad" usually expresses an opinion. I would have understood such a dogmatic downvote if the question would have been on best style in memory management, but in the context of this answer focusing on root cause of the problem, and explaining the confusion, I think it's unfair. – Christophe Aug 04 '15 at 16:54
  • @Christophe: Having been in this industry for decades, and having seen my share of avoidable crashes, I have done away with manual memory management ten years ago. I feel very strongly about it – and _especially_ about teaching it. I consider it fair to downvote any answer teaching this. – sbi Aug 04 '15 at 16:58
-1

Assuming fBinaryFile was obtained with FILE *fBinaryFile = fopen(..., "rb");, and that you ensured that fBinaryFile is not NULL, the following should work :

FILE *fBinaryFile = fopen(..., "rb");
if (fBinaryFile == 0) {
    throw ...;  // do not proceed if file could not be opened
}
fseek( fBinaryFile, 0, SEEK_END );
long lsize = ftell( fBinaryFile );
rewind( fBinaryFile );

If your file contains lsize/4 uint32_t values:

lsize /= sizeof( uint32_t );
uint32_t pDataBuffer[lsize];
//read file
fread( pDataBuffer, sizeof( uint32_t ), lsize, fBinaryFile );
std::vector<uint32_t> cVector( lsize );
for(int i=0; i<lsize; i++) cVector[i] = pDataBuffer[i]
close(fBinary);
return cVector;

If you file contains lsize byte values than you want to process as uint32_t :

unsigned char pDataBuffer[lsize];
//read file
fread( pDataBuffer, sizeof( unsigned char ), lsize, fBinaryFile );
std::vector<uint32_t> cVector( lsize );
for(int i=0; i<lsize; i++) cVector[i] = pDataBuffer[i]
close(fBinary);
return cVector;
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252