C++ Reading File into Char Array

Question

I am using following code to read a file into chararcter array. Now, for small file (say for 2 MB) it is executing properly but for large file (140 MB), in my 18 GB UBUNTU server it is giving segmentation fault. Can anybody help me how to solve this ? I think 18 GB is enough to hold a 240 MB file into memory. I am using 64 bit UBUNTU and compiling using g++.

ifstream is;

char chararray [fileSize] ;

is.read(chararray, fileSize) ;

@milleniumbug: I'd think `fileSize` is something like a `size_t` with value `240*1024*1024`... — Mr.C64, Nov 20 '12 at 10:20
[Related](http://stackoverflow.com/questions/13465227/stack-overflow-with-large-array-but-not-with-equally-large-vector) — Lightness Races in Orbit, Nov 20 '12 at 10:23
Replace your second line of code with `char* chararray = new char[fileSize];`. — Good Night Nerd Pride, Nov 20 '12 at 10:32
@Abbondanza No need to introduce raw array pointers for this, that's a C-minded solution. They are (a) easy to leak and (b) offer the opportunity to forget to `delete[]` instead of `delete`. Vectors are better...and as of C++11 you can even return big ones from functions without paying the cost of copying it, thanks to the copy elision that happens during [return value optimization](http://en.wikipedia.org/wiki/Return_value_optimization). — HostileFork says dont trust SE, Nov 20 '12 at 11:19
@HostileFork OK, now I see why everyone was talking `vector`s here. That confused me a bit, but makes sense. — Good Night Nerd Pride, Nov 20 '12 at 11:36
@Abbondanza ...and don't forget exception-safety! A vector--being a class with a destructor--can clean up after itself. A raw pointer will just leak! :-/ — HostileFork says dont trust SE, Nov 20 '12 at 11:42

score 5 · Accepted Answer · answered Nov 20 '12 at 10:18

5

If the array is a local variable you will get a stack overflow, as it will not fit on the stack. Allocate the "array" on the heap instead, either directly using new or indirectly by using std::vector.

Or use memory mapping. See the mmap function.

answered Nov 20 '12 at 10:18

Some programmer dude

400,186
35
402
621

`std::vector` is probably preferable, but since he said it was a Unix variant, `ulimit -s 500000`, invoked before he runs his program, should work. – James Kanze Nov 20 '12 at 10:25
`ulimit` run from the command line is an ugly way of non-portably solving a problem which can, in this case, be just as easily managed in the code itself. – Richard Nov 20 '12 at 10:50
2

@Richard As I said, using `std::vector` is probably preferable. But there's nothing wrong about knowing about `ulimit` as well. – James Kanze Nov 20 '12 at 11:10
@JoachimPileborg for reading a blob from file, `std::vector` is a bit overkill IMHO. ``mmap()` (+1) you mentioned (or `malloc()`/`new`) seem to be more reasonable. Just remember to `munmap()` (`free()`/`delete`) after use! – peterph Nov 20 '12 at 11:11
@peterph Um...what exactly is overkill about `std::vector`? It provides contiguous memory, handles its own destruction in an exception-safe way. You're probably already using at least one vector in your program already somewhere so the only added "cost" is a handful of bytes for the instance. If you're loading in 18GB of data, are you really going to begrudge vector those [20 bytes (or less)](http://stackoverflow.com/questions/557997/what-is-the-overhead-cost-of-an-empty-vector)? (Not to mention that needs to be compared against the size of a pointer as the alternative.) – HostileFork says dont trust SE Nov 20 '12 at 11:27
@HostileFork depends on what you want from it. If want to encapsulate the data a bit, ok. If you want to modify it by apppending/removing from end, ok. If you are planning to incorporate it into another piece of code which will make use of the STL container, excellent. If you just want to get the data and peek at them randomly, `mmap()` is the way to go. It's not only a question of memory overhead, but also of execution time. From my POV it's more elegant (if you just need to read the data). – peterph Nov 20 '12 at 11:39
3

@peterph Yes, memory mapping is a good thing...although if one is programming in C++ then using something wrapped in a class and which is [platform-independent](http://stackoverflow.com/questions/8215823/platform-independent-memory-mapped-file-io) is more desirable than POSIX mmap. It's just that if a question is tagged C++ then I think one needs to push back against methods that suggest invoking `new[]` *(and, of course, malloc...)* – HostileFork says dont trust SE Nov 20 '12 at 11:42
1

@HostileFork If you're programming in C++, you'll wrap `mmap` in a class, with `munmap` in the destructor. And probably give the class an interface that is as close to that of `std::vector` or `std::array` as possible, so that users feel at home with it. Because of the work involved, however, I would just read into an `std::vector` until I knew I needed the performance (but if I'd needed it once, I'd keep the class in my toolkit, and then use it immediately). – James Kanze Nov 20 '12 at 18:09

score 2 · Answer 2 · answered Nov 20 '12 at 10:17

2

Instead of allocating the char array on the stack, I'd try using std::vector, which will allocate dynamically on the heap:

std::vector<char> buffer(fileSize);
is.read(&buffer[0], fileSize);

answered Nov 20 '12 at 10:17

Mr.C64

41,637
14
86
162

score 1 · Answer 3 · answered Nov 21 '12 at 19:06

The GCC compiler has the default command called size for this! Compile the program using the GCC Compiler. Then you can get the file size!

gcc -Wall test.c
size

This is for a normal C program! Since you had specified no parameter, it takes ./a.out as its default parameter!

If you have to apply some optimization, the code will become like as follows..

praveenvinny@ubuntu:~/Project/New$> gcc -Wall -o1 -fauto-inc-dec test.c -o Output
praveenvinny@ubuntu:~/Project/New$> size output
text       data     bss     dec     hex filename
1067       256      8       1331    533 output

Use text section for code size. You can use data and bss, if you want to consider global data size as well.

This will print the code size,

time -f "%e" -o Output.log ./a.out

will print the execution time to the log file called as Output.log

C++ Reading File into Char Array

3 Answers3