0

I am writing a function that requires me to turn a file into an array of 0s and 1s. I figured this was most easily accomplished using a bool array. However, the below code fails and crashes for files larger than ~1MB. My machine has 8GB RAM, so I see no reason for the crash.

string file_name;
cin >> file_name;
string text = read_file(file_name); //I have defined this function as returning a string containing the file's contents and it works fine when tested separately
int length = text.size();
bool bin_arr[8*length]; //to store 0s and 1s

The initialisation of bin_arr fails, and the program simply exits.

I will be handling files larger than 1GB or so. However, I have no idea why this is happening. I am fairly new to C++.

In case it is relevant, I am on Windows 10, using GCC version 6.3.0.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
Sid
  • 2,174
  • 1
  • 13
  • 29
  • 2
    To start off, get into the habit of using `gcc -pedantic-errors`. This will show you that your code is invalid. Next, start using `std::vector` or other containers, not stack-allocated C arrays, for large and dynamic storage. – Konrad Rudolph May 21 '21 at 15:16
  • What error do you get? Do you get a stack overflow? Your stack is usually around 1 MB, so that checks out. – Aykhan Hagverdili May 21 '21 at 15:23
  • @AyxanHaqverdili I get no error, it simply exits. – Sid May 21 '21 at 15:30
  • Just to add, RAM size of 8GB does not mean the size of stack memory allocated will be equal to 8GB. Stack memory allocated to a program is limited by OS (Generally in the range of few MB's). IN case of linux, there are ways to increase the size of stack to be allocated by OS, however it is best practice to use heap memory for handling larger size arrays, – ThivinAnandh May 21 '21 at 15:32
  • Does this answer your question? [Segmentation fault when trying to create a buffer of 100MB](https://stackoverflow.com/questions/23326988/segmentation-fault-when-trying-to-create-a-buffer-of-100mb) or [Creating large arrays in C](https://stackoverflow.com/questions/43015080/creating-large-arrays-in-c) (also applies to C++) – JaMiT May 21 '21 at 15:33
  • If you will be handling 1 GB files, it is probably better to read sequential chunks of file data rather than 1 whole file. – Dave N. May 21 '21 at 15:38
  • 1
    I wish once and for all, g++ would turn off the VLA acceptance by default, instead of allowing it. That would force the new C++ programmer to research and use standard C++ instead of having their time wasted believing the code that they have is valid. – PaulMcKenzie May 21 '21 at 16:51

2 Answers2

3

Variable-Length Arrays like bool bin_arr[8*length]; are not in the C++ standard and it has risk of stack overflow. You should use std::vector.

Note that std::vector<bool> has a special implementation. To avoid this, using std::vector<char> or std::deque<bool> should be better.

MikeCAT
  • 73,922
  • 11
  • 45
  • 70
  • 7
    This is one case where the specialization for `std::vector` may actually be useful. – François Andrieux May 21 '21 at 15:17
  • @FrançoisAndrieux I have read in multiple places that vectors take too much memory and are inefficient, is `std::vector` better than arrays in this speacial case? – Sid May 22 '21 at 06:19
  • @Sid `std::vector` is generally as efficient as it reasonably can be to do what it does. If you need a dynamically sized array that won't resize, you can save a couple of bytes, and if you know the size at compile time you can save a few more with `std::array`. But it is not usually considered inefficient. You should be using it whenever you need a dynamically sized array. – François Andrieux May 22 '21 at 12:52
  • @Sid `std::vector` is required by the standard to be implemented differently than any other `std::vector` such that `CHAR_BIT` (bits per byte) `bool` element can be represented in each `char` (byte)` of storage allocated for the elements. It packs `bool`s to a bit each. However, to achieve this, a lot of useful and important characteristics have to be discarded. For example, you can't get a pointer or reference to a specific `bool` element in `std::vector`. In almost every case, this tradeoff is not worth it. But in your specific case, it may be useful. – François Andrieux May 22 '21 at 12:54
3

I am writing a function that requires me to turn a file into an array of 0s and 1s. I figured this was most easily accomplished using a bool array.

I don't know why you figured that, the easiest way is to store a file as bytes, exactly as the file contains. vector<bool> does this, for example, it doesn't store each bit in an individual byte like your code, and is thus 8 times more memory efficient.

To get individual bits, either use the aforementioned vector<bool> or use regular bit fiddling. Remember that b & (1 << bit_number) returns a non-zero value if that bit is set in your byte.

I will be handling files larger than 1GB or so.

Then don't store the entire file in memory, stream it in small chunks.

However, I have no idea why this is happening.

Oh, that's easy. You are trying to allocate a needlessly gigantic array needlessly on the stack, instead of on the heap. Your stack has very strict limits, by default 1MB I believe in Windows.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • I was under the impression that a boolean would use just 1 bit- is it not a 0/1 value? – Sid May 21 '21 at 15:32
  • @Sid The size of objects in C++ is measured in bytes, the smallest an object can be in C++ is 1 byte, which is at least 8 bits. So `bool` is usually (and at least) 8 bits wide. – François Andrieux May 21 '21 at 15:35