0

I am reading a file of floating point numbers and then sorting them. When I use my following sort and swap functions with 1 million numbers, I am successfully able to sort the numbers. However, when I try to sort 100 million numbers, I get a segmentation fault. I am not sure why because I am dynamically allocating memory. How would I be able to handle more than 1 million numbers?

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

void swap(float *a, float *b, size_t n) {
    size_t numbytes; 
    size_t sz = sizeof(float); 
    void *temp = NULL; 
    
    numbytes = n * sz; 
    if (numbytes == 0){
        exit(EXIT_FAILURE); 
    }
    temp = malloc(numbytes);
    
    memcpy(temp, a, numbytes);
    memcpy(a,b,numbytes); 
    memcpy(b,temp,numbytes);
    
    free(temp); 
}

void radixSort(float array[], size_t count) {
    int numOfZero = 0; 
    float a[count];
    float *b = a;
    
    for (uint32_t radix=1; radix; radix<<=1) { //unsigned int 32 bit
        uint32_t *arrayToInt = (uint32_t *)array;
        int zeroCount=0;
        int oneCount=0;
        numOfZero=0;
        
        for (int j=0; j < count; ++j)
        numOfZero += !(arrayToInt[j]&radix);
        oneCount=numOfZero;
        
        for (int j=0; j < count; ++j)
        if (arrayToInt[j]&radix){
            b[oneCount]=array[j];
            ++oneCount;
        }
        else{
            b[zeroCount]=array[j];
            ++zeroCount;
        }
        swap(b,array,count);
    }
    if (numOfZero < count){
        memcpy(b+(count-numOfZero), array, numOfZero*sizeof(float));
        
        for (int d=0,j=count-1;j>=numOfZero;j--,d++)
        b[d]=array[j];
        memcpy(array, b, count*sizeof(float));
    }
}
int main(int argc, char *argv[]) {
    int fd; 
    float num; 
    size_t nr; 
    int eleNum = 0; 
    
    fd = open(argv[1], O_RDONLY); 
    
    if (fd == -1){
        perror("Error opening file");
        exit(EXIT_FAILURE);
    }
    
    struct stat st; 
    fstat(fd, &st); 
    off_t size = st.st_size; 
    for (int j = 0; j < size/4; j++){
        eleNum++; 
    } 
    
    float array[eleNum];
    for (int i = 0; i < eleNum; i++){
        nr = read(fd, &num, sizeof(float));
        if (nr == -1){
            perror("Error reading file"); 
            exit(EXIT_FAILURE); 
        }
        array[i] = num; 
    }
    
    radixSort(array, eleNum);
    
    close(fd); 

    return 0; 
}
  • 2
    A good place to start with be to check the return value of `malloc`. – kaylum Dec 07 '20 at 07:20
  • Why the error check in `swap`? If it's zero bytes, just do nothing. You can safely pass a size of zero to `memcpy`. – klutt Dec 07 '20 at 07:27
  • 3
    Wait. A *hundred million* numbers? Do you understand how a VLA (variable-length array) typically *works* ? This code is a *recipe* for a stack breach. – WhozCraig Dec 07 '20 at 07:28
  • Not related, but `for (int j = 0; j < size/4; j++) eleNum++;` is a *very* roundabout way to write `eleNum = size/4;`. Plus, you never check for integer overflow if `size/4` is larger than `INT_MAX`. – dxiv Dec 07 '20 at 07:31
  • @RetiredNinja I created a file with 1.2 as data. I used `read` to read data into float variable. When I printed it to the console, it prints 0.0000, not the number. – Krishna Kanth Yenumula Dec 07 '20 at 07:54
  • Just out of curiosity, why is `size_t sz = sizeof(float); ` needed if your parameter list includes `void swap(float *a, float *b, ...` You already know you have `float` parameters and `sizeof(float)` is only needed in `numbytes = n * sz;`?? Not wrong, just curious? And wouldn't' `eleNum = size / 4;` avoid the loop? – David C. Rankin Dec 07 '20 at 07:54
  • @RetiredNinja I was able to print, only with conversion using `strtof()` – Krishna Kanth Yenumula Dec 07 '20 at 07:54
  • @KrishnaKanthYenumula You created a text file, so you need `strtof` to convert the values back to `float`. The OP, however, is reading a binary file which contains binary `float` values, so they just need to `read` those in as they are. – dxiv Dec 07 '20 at 08:00

1 Answers1

6

These lines:

float a[count]; // In radixSort

float array[eleNum]; // In main

will never work for such big numbers. VLA:s are (typically and always in practice) allocated on the stack. On Windows systems, the stack is typically 1MB and 8MB on Linux. I have written an answer about VLA:s that would be good for you to read. In short, I would advice not using them. Do I really need malloc?

I'm not sure if changing to malloc will solve your problem, but you will not solve it without doing it.

Furthermore, you should check the return value from malloc to see if the allocation worked. Then, if your problems persist, I'd recommend compiling with -Wall -Wextra -pedantic -std=c11 -fsanitize=address -g. Use gdb or another debugger to find which line causes a segfault and investigate values. Use valgrind to detect memory leaks.

And this:

for (int j = 0; j < size/4; j++){
    eleNum++; 
} 

is just weird. It's equivalent to eleNum = size/4.

This, in swap:

if (numbytes == 0){
    exit(EXIT_FAILURE); 
}

is completely unnecessary. It's safe to pass 0 for the size argument to memcpy. It would just lead to that nothing happens. I could understand this for debugging purposes, but in that case you should print something useful, or even better, use assert(numbytes > 0)

klutt
  • 30,332
  • 17
  • 55
  • 95