0

So I have included my successfully working code below, that takes in a .txt file with approximately 1.6 million lines with one word per line. This reads in successfully, qsort sorts the words alphabetically and then outputs to the out file. The problem however is from what I understand to be the size limitations of the array. Up to approximately 1 million lines in the array and it works and finishes in less than 2-3 seconds, but more than ~1 million and it errors with a segmentation fault.

Doing lots of reading and I have seen malloc, but that doesn't input into an array? And the file input can't be input into qsort? Any pointers how to get around this? The 16 in line array, specifies that there are no words in the txt that exceed 15 chars (chars+1).

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

static void sort_a(void *array, unsigned n);
static int cmpr(const void *a, const void *b);

int main(void)
{
    static const char filename[] = "input.txt";
    static const char filename_out[] = "output.txt";

    int i = 0;
    int j = 0;
    FILE *file = fopen(filename, "r");
    FILE *file_write = fopen(filename_out, "w");

    char line[16];
    char *line_array[1000001]; //works sub 1 Million

    if (file != NULL){
        while (fgets(line, sizeof line, file) != NULL) {
            // Trim the newline character
            line[strcspn(line, "\n")] = '\0';

            // Stop processing if line_array is full
            if (i < sizeof line_array / sizeof *line_array) {
                line_array[i++] = strdup(line);
            }
            else {
                break;
            }
        }
        fclose(file);

        sort_a(line_array, i);

        for (j = 0; j < i; j++) {
            fprintf(file_write, "%s\n", line_array[j]);
        }
        fclose(file_write);

        // Clean up memory
        for (j = 0; j < i; j++) {
            free(line_array[j]);
        }
    }
    else {
        perror(filename);
    }
    return 0;
}

int cmpr(const void *a, const void *b) {
    return (strcmp(*(char **)a, *(char **)b));
}

void sort_a(void *array, unsigned n) {
    qsort(array, n, sizeof(const char *), cmpr);
}
brentm
  • 1
  • 1
  • If you are getting the input one by one, a better sorting might be insertion sort on the fly (better yet - binary insertion sort). – Eugene Sh. Aug 31 '22 at 13:35
  • You can replace any static object (in a static data section) or dynamic object (on the stack) with an object allocated via `malloc()` (on the heap). In your case, however, other data structures might be useful. Please [edit] your question and show some attempt to use `malloc()`, and make sure to ask specific question(s) on it. – the busybee Aug 31 '22 at 13:48
  • You are not supposed to store large variables on the stack. That is normally rather limited. You are probably facing some stack overflow exception. You can make that array `static` moving it out of the local stack. Or use `malloc` as suggested in comment above. – Gerhardh Aug 31 '22 at 13:53
  • Chances are that you're on a 64-bit Unix (Linux?) machine. You probably have 8 MiB of stack space. Your array of 1 million 8-byte pointers uses almost all of the stack. When your code exceeds the stack limit, it crashes unceremoniously. Don't place such big arrays on the stack (as local variables). Allocate it dynamically with `malloc()` or make it into a global variable. (On Windows, the stack size is much smaller by default; you would have been running into problems when the data reached 1 MiB.) – Jonathan Leffler Aug 31 '22 at 13:53
  • If you use `char **line_array;` (maybe with a change of name) and allocate some initially sized block with `line_array = malloc(cur_max_lines * sizeof *line_array);`, you could increase the size with `realloc` when it fills up. The maximum number of lines can be increased geometrically for each `realloc` (for example, by doubling the maximum number of lines). – Ian Abbott Aug 31 '22 at 14:57

1 Answers1

0

In this case. The array size is larger than the system limit, a segmentation fault occurred. You can change the system stack limit:

  1. ulimit -s // Get the current system stack limit
  2. ulimit -s 102400 // Change the limit to 100MB