2

I have a 80370-long list of 30-digit numbers in a text file, each in a separate line.
I want to sort them in C.

My idea was creating a new file, then adding the first number to it.
Then, each time we compare it with another number. If it's smaller, the other number is appended. Easy. However, if it's larger, the new number must be prepended.
I tried prepending like so:

void prepend(char line[], FILE* w, FILE* waux, char filename[], char auxname[]) {
    fprintf(waux, "%s\n", line); //print new number to new file (waux)
    char ch;
    while((ch = fgetc(w)) != EOF) {
        fputc(ch, waux); //read old file (w) and add to new file (waux)
    }
    remove(filename); //delete old file
    rename(auxname, filename); //rename new file to old file's name
}

But then I try to read the output and it's filled with NULL chars (in Notepad++).
Among the million NULL chars we can find the numbers, but they're not sorted at all.

Antonio AN
  • 135
  • 1
  • 9
  • 5
    why do you want to do it by writing partial output to a file? Just read all data, sort and write all data at once. Actually you don't even need to roll your own sorting function, you can use `` `qsort` function: http://en.cppreference.com/w/cpp/algorithm/qsort – Jack Aug 31 '16 at 21:31
  • 3
    That's a bit more than two megabyte (if you use pointers, perhaps 3MB). It can't be too hard to allocate (using malloc()) an array that can contain so many numbers. – Rudy Velthuis Aug 31 '16 at 21:34
  • Alright, I realized that my idea will not work to sort my numbers, so please just consider the problem of the prepending not working. Though, if you have ideas for how to sort them, that would be great. (: – Antonio AN Aug 31 '16 at 21:35
  • 3
    As mentioned above: read everything into dynamically allocated array, then use qsort to sort it. – Serge Aug 31 '16 at 21:37
  • Read all the numbers into an array, sort the array with `qsort`, write the array to the output file. – user3386109 Aug 31 '16 at 21:37
  • 1
    80370 30-digit numbers is not too big for an array on a modern general-purpose computer, even if the numbers are represented as strings of digits. – John Bollinger Aug 31 '16 at 21:37
  • Thanks everyone! I didn't think that finding 2mb in the memory was such an easy task. It's great to hear – Antonio AN Aug 31 '16 at 21:38
  • File-based sorting used to be a thing, but its importance has greatly diminished with the increase in computers' memory sizes. It's still important in databases, though. Efficient implementations are non-trivial. – John Bollinger Aug 31 '16 at 21:39
  • 1
    even if you are using 32 bit version of windows, you have at least 2 GB of virtual address space. Even if you don't have enough physical RAM let the OS solve this problem for you using swap space. But in you case 2MB is not a bottleneck nowadays – Serge Aug 31 '16 at 21:41
  • The array can be a global variable, a static variable, or dynamically allocated with `malloc`. The only thing you *can't* do is declare the array as a local variable. – user3386109 Aug 31 '16 at 21:46
  • I'd strongly suggest: read the whole file in to an array in memory, then use an existing sorting algorithm like `qsort()` – user3629249 Aug 31 '16 at 21:50
  • So I want a 2D array of size `[80370][30]`. How do I do that in `malloc`? – Antonio AN Aug 31 '16 at 21:53
  • 1
    http://stackoverflow.com/questions/1970698/using-malloc-for-allocation-of-multi-dimensional-arrays-with-different-row-lengt – yano Aug 31 '16 at 21:57
  • struct tagMyArray { char line[30]; }; struct tagMyArray * myArray = malloc( sizeof( struct tagMyArray ) * 80370); If( !myArray ) { // handle error } – user3629249 Aug 31 '16 at 21:59
  • Or just `char (*myarr)[30] = malloc (80370 * sizeof *myarr);` – David C. Rankin Aug 31 '16 at 22:36
  • The basic mechanics of an external sort where the data _is_ too big to fit in memory are: (1) Read as much data into memory as will fit; (2) sort it; (3) write it out to disk; (4) Repeat steps 1-3 until there's no more data to read; (5) merge the sorted files into sorted output files (if necessary, you may have to make multiple merge passes). The tricks are then optimizing this processing to minimize the amount of data written to disk, and the number of passes over the data in general. – Jonathan Leffler Aug 31 '16 at 23:10
  • I had an assembly language assignment (in the mid 90s) that asked us to do this using no extra memory, only registers on a 486 and the file had to be sorted in place. Needless to say it takes a long time to bubble sort a file 2 lines at a time while swapping 2 characters at a time, especially when the file is on a floppy disk, but it can be done. Though ~half the class dropped after that assignment. – technosaurus Sep 01 '16 at 02:40

1 Answers1

3

While I agree with the comments that you should allocate memory for all your data using malloc, sort the data in memory, and then write the data to the file, I also think that your approach should also work.

I'm guessing that the reason for your approach not working is that you have open file handles to the files at the time you are using

remove(filename); //delete old file
rename(auxname, filename); //rename new file to old file's name

My suggestion:

  1. Open the files in the function.
  2. Do the read and append.
  3. Close the files.
  4. Then do the remove and rename.

void prepend(char line[], char filename[], char auxname[]) {

   FILE* w = fopen(filename, "r");
   if ( w == NULL )
   {
      fprintf(stderr, "Unable to open %s for reading from.\n", filename);
      return;
   }

   FILE* waux = fopen(auxname, "w");
   if ( waux == NULL )
   {
      fprintf(stderr, "Unable to open %s for writing to.\n", auxname);
      fclose(w);
      return;
   }

   fprintf(waux, "%s\n", line); //print new number to new file (waux)
   int ch;
   while((ch = fgetc(w)) != EOF) {
      fputc(ch, waux); //read old file (w) and add to new file (waux)
   }

   fclose(waux);
   fclose(w);

   remove(filename); //delete old file
   rename(auxname, filename); //rename new file to old file's name
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270