1

Disclaimer: I know bubble sorting this much data is a waste of time, but I need to do it in some sort of "investigation".

I have a .csv file with 100.000 ints. The program crashes during bubble sorting, and while debugging it throws a "Segmentation fault". This crash ONLY happens when you enter a num higher than approximately 35.000, below that it works perfectly.

This is the recursive function:

void swap(int *arr, int i, int j) {
    int temp_num = arr[i];
    arr[i] = arr[j];
    arr[j] = temp_num;
}

void recur_BubbleSort(int *arr, int len) {
    if (len == 1) {
        return;
    }
    for (int i = 0; i < len - 1; i++) {
        if (arr[i] > arr[i + 1]) {
            swap(arr, i, i + 1);
        }
    }
    recur_BubbleSort(arr, len - 1);
}

And this is the main:

int main() {
    srand(time(NULL));
    FILE *fpt;
    fpt = fopen("numbers_data.csv", "r");
    char data[MAX];
    int line = 0;
    int num = 40000;
    int *data_int;
    data_int = (int *)malloc(sizeof(int) * (num));

    // Get values from .csv and add them to the array
    while (!feof(fpt) && (line < num)) {
        if (fgets(data, MAX, fpt) != NULL) {
            data_int[line] = atoi(data);
            line++;
            if (num <= 100)
                printf("[%d] \n", atoi(data));
        }
    }
    fclose(fpt);
    
    recur_BubbleSort(data_int, num);

    printf("Closing...\n");
    system("pause");
    return 0;
}

I don't know if it's necessary, but this is the function used to create the .csv

void GenerateData(void) {
    int temp_num;
    FILE *fpt;
    fpt = fopen("numbers_data.csv", "w+");
    for (int i = 0; i < 100000; i++) {
        temp_num = rand();
        fprintf(fpt, "%d\n", temp_num);
    }
    fclose(fpt);
    printf("Created.\n");
    system("pause");
}

I'm about to throw in the towel, so any help would really be appreciated, thank you.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Cris
  • 11
  • 2
  • 2
    The stack has a much smaller size than the heap. 35,000+ levels of recursion could be exceeding that size. – Dave S Oct 24 '22 at 18:34
  • 2
    There is a practical limit to recursion depth. Your recursive Bubble sort is very likely surpassing it. Segmentation faults and similar memory errors are often the result. You might find that turning on compiler optimizations rescues your program (which it could do by tail-call optimization). Otherwise, and IMO better, would be to convert your recursive bubble sort to a conventional iterative one. – John Bollinger Oct 24 '22 at 18:35
  • 3
    See [`while (!feof(file)))` is always wrong](https://stackoverflow.com/q/5431941/15168) and don't use it. It doesn't cause the crash, but it doesn't help when you get extraneous data into your program. – Jonathan Leffler Oct 24 '22 at 18:36
  • You'll also run into stack size limits allocating massive arrays and structs as local (stack) variables instead of using malloc/free or new/delete. Modern compilers will warn you about that one. – Dave S Oct 24 '22 at 18:38
  • What is the highest number your environment represents in a `int`? – Yunnosch Oct 24 '22 at 18:39
  • The code does allocate the main data array, @DaveS. – Jonathan Leffler Oct 24 '22 at 18:40
  • @Yunnosch It's a random number generated by rand(), So any number between 0 and 32766, and now that I think about it, that's exactly around the number of recursions. – Cris Oct 24 '22 at 19:17
  • @Yunnosch That's just the range of the random generated number by rand(), which I supposed is based on short ints, the entirety of the program is based on normal ints, which should have capacity from 0 to 2 billions, am I wrong? – Cris Oct 24 '22 at 19:25
  • 1
    @hyde I see my mistake now. What made me think that was that OP anwered my question "What is the highest number your environment represents in a int?" with values and I missed the reference to random. – Yunnosch Oct 24 '22 at 19:28
  • @Yunnosch I misunderstood your question, the int should have a capability of 2 billions on my system, even so, the program still stores up the variable num, you can even throw printf on the recursive function as well, but it will still crash anywhere between 30.000-35.000 – Cris Oct 24 '22 at 19:28
  • Anyway, the code seems a bit funny. Like `int num = 40000;` vs `if (num <= 100)`, what's the purpose? – hyde Oct 24 '22 at 19:28
  • Also, if you have more than `num` lines, there'll be buffer overflow. – hyde Oct 24 '22 at 19:29
  • I work a lot in environment where keeping track of potentially 16bit ints is important. That's why I asked and it is why I misread the answer. Sorry if I seemed cheeky. – Yunnosch Oct 24 '22 at 19:30
  • @hyde I extracted just this part from the code for testing which is the one causing problems, did my best to delete everything else that was irrelevant, still; seems like I accidentally forgot to delete some lines. – Cris Oct 24 '22 at 19:30
  • You should create a [mcve], so you can be sure you show the right code. – hyde Oct 24 '22 at 19:31
  • @hyde You mean if the file has more lines than `num`? – Cris Oct 24 '22 at 19:34
  • I tried with bubble sorting (the same function )an array made by numbers from 0 to 40.000 and it still gets Segmentation fault around 32k – Cris Oct 24 '22 at 19:46
  • 2
    Recursion is not an appropriate technique for bubble sort. However, if you run on a machine with a big enough stack and code it correctly, it works without crashing. Since you're on Windows (judging from the use of `system("pause");`), your stack is limited to 1 MiB. Testing on my Mac, the code uses 16 bytes per level of recursion (so about 640,000 total bytes), but the code does define a local variable to give a stack address to the stack tracking code. That increases the amount of stack used. – Jonathan Leffler Oct 24 '22 at 21:11

1 Answers1

0

I solved it. I didn't know there's a predetermined stack size, which has a cap at 8 MB for GCC (the compiler which I was using). It was fixed by increasing the size of it through the terminal.

Cris
  • 11
  • 2
  • 1
    If this circumvention was necessary and successful, you must have a lot of other data on the stack, I think. Use dynamic memory allocation (`malloc()` etc) to avoid stack overflows — not forgetting to free the allocated memory when appropriate. – Jonathan Leffler Oct 24 '22 at 21:13
  • @JonathanLeffler The code above uses malloc for the array, but the problem resided within the number of recursions made by `recur_Bubblesort`, rather than the array itself. – Cris Oct 24 '22 at 21:32
  • I agree that the stack depth from too many levels of recursion was the problem. But for the code shown, I was able to recurse 40,000 times and use 640,000 bytes of stack — which fits even on Windows, and certainly well within the 8 MiB limit you mentioned (which is the normal maximum stack size on Unix-like boxes — Linux and macOS, etc). So, I think you must have other stuff stashed on the stack (big arrays in `main()`?), or more variables than you're letting on about, that cause more space to be used. Obviously, I can't be certain because I've not seen your whole program (and don't want to). – Jonathan Leffler Oct 24 '22 at 22:00