Segmentation fault when my array is too big

Question

Edit: Seems like the error is simply 9,999,999,999,999 being too big a number for an array.

I get this error "program received signal sigsegv segmentation fault" on my code.

Basically my code is to do an integer factorisation, and the exercise can be seen on codeabbey here. Basically I will receive input like 1000 and output them as a product of their factors, 2*2*2*5*5*5 in this case.

I do the above by having a vector of prime numbers which I generate using the Sieve of Eratosthenes method.

According to the website, the number of digits of the input will not exceed 13, hence my highest number is 9,999,999,999,999. Below is my code.

#include <iostream>
#include <vector>
#include <cstring>

unsigned long long int MAX_LIMIT = 9999999999999;


std::vector<unsigned long long int> intFactorisation (unsigned long long int num) {
    std::vector<unsigned long long int> answers;
    static std::vector<unsigned long long int> primes;
    if (primes.empty()) {               // generate prime numbers using sieve method
        bool *arr = new bool[MAX_LIMIT];
        memset (arr, true, MAX_LIMIT);
        for (unsigned long long int x = 2; x*x < MAX_LIMIT; x++) {
            if (arr[x] == true) {
                for (unsigned long long int y = x*x; y < MAX_LIMIT; y += x) {

                    arr[y] = false;  // THIS LINE ALWAYS HAS AN ERROR!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                }
            }
        }
        for (unsigned long long int x = 2; x <= MAX_LIMIT; x++) {
            if (arr[x]) {
                primes.push_back(x);
            }
        }
    }
    std::vector<unsigned long long int>::iterator it = primes.begin();  // start the factorisation
    while(it != primes.end()) {
        if (num % *it == 0) {
            answers.push_back(*it);
            num/=*it;
        }
        else {
            it++;
        }
        if (num == 1) {
            break;
        }
    }

    return answers;
}


int main() {


    int maxi;
    std::cin >> maxi;
    int out[maxi];
    for (int x = 0; x < maxi; x++) {
        std::cin >> out[x];
    }

    for (auto it : out) {
        std::vector<unsigned long long int> temp = intFactorisation(it);
        for (std::vector<unsigned long long int>::iterator it = temp.begin();
            it != temp.end(); it++) {
            if (it == temp.end() - 1) {
                std::cout << *it << " ";
            }
            else {
                std::cout << *it << "*";
            }
        }
    }

}

However, for some reason the program will always terminate at arr[y] = false in the function intFactorisation. I will see a notification giving the segmentation fault message on the bottom left of my screen while using CodeBlocks.

I already used 'new' on my ridiculously large array, so memory should be on the heap. I have tried using a smaller MAX_LIMIT such as 100,000 and my function works. Anyone knows why?

Also, I am wondering why I don't need to dereference my pointer arr. For example, arr[y] = false works, but *arr[y] or (*arr)[y] doesn't. Hopefully this can be clarified too.

Thanks for reading this and I appreciate any help.

"_Also, I am wondering why I don't need to dereference my pointer arr._" You are already doing it. `arr[y]` is equivalent to `*(arr + y)`. — Algirdas Preidžius, Feb 24 '20 at 15:54
consider the case when `y == MAX_LIMIT`, what happens in your code? — Matt Messersmith, Feb 24 '20 at 15:54
`int out[maxi];` -- Use used `std::vector` in other places. Why didn't you use it here also? — PaulMcKenzie, Feb 24 '20 at 15:54
Unless `sizeof(bool)` is one, `memset (arr, true, MAX_LIMIT);` does not do what you think it does, but will lead to undefined behaviour. — molbdnilo, Feb 24 '20 at 15:55
Also `bool *arr = new bool[MAX_LIMIT];` -- This is a gigantic memory leak in your code. Again, you should use `std::vector` — PaulMcKenzie, Feb 24 '20 at 15:58
If `sizeof(bool)` *is* one, you're trying to allocate ten terabytes for that array. If it's four (which is likely) that's forty terabytes. Do you really have that much to spare? — molbdnilo, Feb 24 '20 at 16:00
@MattMessersmith Right, I decided to change it to y + x < MAX_LIMIT in my code, but it still doesn't work. Also, if that was the issue why does using a smaller MAX_LIMIT work? — hkcode, Feb 24 '20 at 16:02
Please note that you only need the primes up to the *square root* of the maximum value in input. — Bob__, Feb 24 '20 at 16:03
@user6088862 "_if that was the issue why does using a smaller MAX_LIMIT work?_" Undefined behavior is undefined. One possible manifestation of it is: appearing to work. — Algirdas Preidžius, Feb 24 '20 at 16:03
@PaulMcKenzie Hi, I was thinking of using vector too for bool, but the Sieve method I saw requires me to set everything to true, then set specific numbers to false, which I think array is better than vector at? How do I change a particular position of a vector from true to false? — hkcode, Feb 24 '20 at 16:07
*but the Sieve method I saw requires me to set everything to true* -- `std::vector arr(MAX_LIMIT, true);` — PaulMcKenzie, Feb 24 '20 at 16:09
@molbdnilo Oh, which means that an array of size 9,999,999,999,999 is simply not possible? — hkcode, Feb 24 '20 at 16:09
@PaulMcKenzie `std::vector arr(MAX_LIMIT, true);` But I still need to change specific positions from true to false, doesn't that mean I need to insert and then delete? Is that really better than just an array — hkcode, Feb 24 '20 at 16:15
@user6088862 -- [Please look at the documentation](https://en.cppreference.com/w/cpp/container/vector_bool). Access to a particular bit is done by simply using `[ ]`, no different than an array. — PaulMcKenzie, Feb 24 '20 at 16:16
@PaulMcKenzie Alright, I didn't realise that, thank you for the help — hkcode, Feb 24 '20 at 16:19
@user6088862 `std::vector` uses a dynamic array underneath so it's equivalent for most operations. It's just got a couple of integers more to track sizes and it offers you a ton of safety and extra features. — François Andrieux, Feb 24 '20 at 16:21
Does this answer your question? [Create a big array in C++](https://stackoverflow.com/questions/3137598/create-a-big-array-in-c) — phuclv, Feb 24 '20 at 16:30
Why you need to allocate such a big array in first place just to print list of factors? Solution could be just find a factor and insert in to a `vector` or print directly instead of storing? — TruthSeeker, Feb 24 '20 at 16:37
@phuclv not really because for my case it is simply not possible for most computers out there as the array is really too big, but turns out I didn't need such a big number, just the sqrt of it. — hkcode, Feb 24 '20 at 16:48
@TruthSeeker Initially I wanted to generate all the prime numbers first because they are the factors. But what you said makes sense. — hkcode, Feb 24 '20 at 16:49
@user6088862 That huge array is not possible, but you don't need all of it since you only need to search up to the square root. That makes just over three million elements, which is no problem on a normal machine. — molbdnilo, Feb 25 '20 at 06:39

score 1 · Answer 1 · answered Feb 27 '20 at 16:20

There are two types of issues in the posted code, the memory managment and the algorithms.

Resource acquisition

The program exhibits three kinds of memory allocation:

Variable length array. In main, out is declared as int out[maxi], maxi beeing a variable, not a compile time constant. This is a C99 feature, it's never been part of any C++ standard, even if it is offered as a feature by some compiler.
bool *arr = new bool[MAX_LIMIT];. Modern guidelines suggest to avoid the use of bare new to allocate memory and to prefer smart pointers, standard containers and in general to follow the RAII idiom, but that's not even the major problem here. MAX_LIMIT is way too big to not lead to a std::bad_alloc exception and also delete is never called.

In the same function there's also a loop which will end up accessing the (unlikely) allocated memory out of bounds: for (unsigned long long int x = 2; x <= MAX_LIMIT; x++) { if (arr[x]) {, x will eventually become MAX_LIMIT, but you'll run out of memory way before.
std::vector. That would be the way, if only the program wouldn't try to fill the vector primes with all the primes up to MAX_LIMIT, which is 10¹³-1 or almost 80TB assuming a 64-bit type.

Algorithm

The idea is to first calculate all the possible primes and then, for each inputted number, to check if any of them is a factor. The problem is that the maximum possible number is a really big one, but the good news is that you don't need to calculate and store all the primes up to that number, but just to the square root of that.

Imagine to have tried all the primes up to that square root (let's call it S) and consequently divided the original number by any factor found. If there's still a remainder, that value isn't divisible by any of the mentioned primes and is less or equal to the original number. It has to be a prime number itself. All the possible factors <= S have been already ruled out, and so any hypothetical factor > S (what would be the result of the division by it? another already tested prime less than S).

From a design point of view, I'd also address how the primes are calculated and stored in OP's code. The factorization function is basically written as the follows.

unsigned long long int MAX_LIMIT = 9999999999999;

std::vector<unsigned long long int> intFactorisation (unsigned long long int num)
{
    static std::vector<unsigned long long int> primes;
//  ^^^^^^                                           ^
    if (primes.empty())
    {
        // generate prime numbers up to MAX_LIMIT using sieve method
    }
    // Use the primes...
}

The static vector could have been initialized using a separete function and also declared const, but given the close relation with the factorization function it could be better to wrap those data and functionalities into a class responsible for the correct allocation and initialization of the resources.

Since the introduction of lambdas in the language, we can avoid most of the boilerplate associated with a normal functor class, so that the factorization function might be constructed as a stateful lambda returned by the following:

auto make_factorizer(uint64_t max_value)
{
    uint64_t sqrt_of_max = std::ceil(std::sqrt(max_value));
    // Assuming to have a function returning a vector of primes.
    // The returned lambda will accept the number to be factorized and
    // a reference to a vector where the factors will be stored
    return [primes = all_primes_up_to(sqrt_of_max)](uint64_t number,
                                                    std::vector<uint64_t>& factors)
    {
        uint64_t n{number};
        factors.clear();
        // Test against all the known primes, in ascending order
        for (auto prime : primes)
        {
            // You can stop earlier, if prime >= sqrt(n)
            if (n / prime <= prime)
                break;
            // Keep "consuming" the number with the same prime,
            // a factor can have a multiplicity greater than one.
            while (n % prime == 0)
            {
                n /= prime;
                factors.push_back(prime);
            }
        }
        // If n == 1, it has already found and stored all the factors. 
        // If it has run out of primes or breaked out from the loop,
        // either what remains is a prime or number was <= 1. 
        if (n != 1  ||  n == number)
        {
            factors.push_back(n);
        }
    };
}

Here a live implementation is tested.

score 0 · Answer 2 · answered Feb 24 '20 at 19:45

Simpler solution could be finding all prime factors on-the-fly rather than initially listing all prime numbers and checking whether it is a factor of a given number.

Here is the function to find the factor without allocating large memory,

std::vector<unsigned long long int> intFactorisation(unsigned long long int num) {
    std::vector<unsigned long long int> answers{};
    unsigned long long int Current_Prime = 2;
    bool  found;
    while (num!=1 && num!=0)
    {
        //push all the factor repeatedly until it is not divisible
        // for input 12 push 2,2 and exit
        while (num % Current_Prime == 0) {
            answers.push_back(Current_Prime);
            num /= Current_Prime;
        }
        //find Next Prime factor 
        while (Current_Prime <= num) {
            Current_Prime++;
            found = true;
            for (unsigned long long int x = 2; x*x < Current_Prime; x++)
            {
                if (Current_Prime%x == 0) {
                    found = false;
                    break;
                }
            }

            if (found == true)
                break;
        }
    }

    return answers;
}

Segmentation fault when my array is too big

2 Answers2