1

I have a .txt file and I want to sort it using C programming. I have this code for sorting a .txt file:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define MAX_LEN 100 // Length of each line in input file.

int main(void)
{
    char *strFileName = "/home/milad/Desktop/ddd.txt";
    char *strFileSummary = "/home/milad/Desktop/ddd2.txt";
    char strTempData[MAX_LEN];
    char **strData = NULL; // String List
    int i, j;
    int noOfLines = 0;

    FILE * ptrFileLog = NULL;
    FILE * ptrSummary = NULL;

    if ( (ptrFileLog = fopen(strFileName, "r")) == NULL ) {
        fprintf(stderr,"Error: Could not open %s\n",strFileName);
        return 1;
    }
    if ( (ptrSummary = fopen(strFileSummary, "a")) == NULL ) {
        fprintf(stderr,"Error: Could not open %s\n",strFileSummary);
        return 1;
    }

    // Read and store in a string list.
    while(fgets(strTempData, MAX_LEN, ptrFileLog) != NULL) {
        // Remove the trailing newline character
        if(strchr(strTempData,'\n'))
            strTempData[strlen(strTempData)-1] = '\0';
        strData = (char**)realloc(strData, sizeof(char**)*(noOfLines+1));
        strData[noOfLines] = (char*)calloc(MAX_LEN,sizeof(char));
        strcpy(strData[noOfLines], strTempData);
        noOfLines++;
    }
    // Sort the array.
    for(i= 0; i < (noOfLines - 1); ++i) {
        for(j = 0; j < ( noOfLines - i - 1); ++j) {
            if(strcmp(strData[j], strData[j+1]) > 0) {
                strcpy(strTempData, strData[j]);
                strcpy(strData[j], strData[j+1]);
                strcpy(strData[j+1], strTempData);
            }
        }
    }
    // Write it to outfile. file.
    for(i = 0; i < noOfLines; i++)
        fprintf(ptrSummary,"%s\n",strData[i]);
    // free each string
    for(i = 0; i < noOfLines; i++)
        free(strData[i]);
    // free string list.
    free(strData);
    fclose(ptrFileLog);
    fclose(ptrSummary);
    return 0;
} 

This code is case sensitive and it sorts the upper-case letters first, then the lower case letters and That is not what I want. I want it to sort letters alphabetically and without any sensitivity to the case of the letter. I know about ASCII codes and why I have this problem, but I can't find a way to fix it.

How can I change the code to make it case insensitive?

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
MMM
  • 43
  • 6
  • 2
    MSVC has the case-insensitive `strncmp` not sure if it is standard. Aside: it can be more efficient to sort *pointers* not the data they point to. In which case you don't have to worry about string lengths. – Weather Vane Sep 27 '17 at 17:06
  • 2
    Use `strcasecmp` instead of `strcmp`. – BLUEPIXY Sep 27 '17 at 17:07
  • 4
    [The `strcasecmp()` function is POSIX-standard.](http://pubs.opengroup.org/onlinepubs/9699919799/functions/strcasecmp.html) – Andrew Henle Sep 27 '17 at 17:07
  • 5
    Have you considered using `qsort` instead of rolling your own? – autistic Sep 27 '17 at 17:11
  • [Case Insensitive String comp in C](https://stackoverflow.com/q/5820810/995714), and [don't cast the result of `malloc` in C](https://stackoverflow.com/q/605845/995714) – phuclv Sep 27 '17 at 17:18
  • An important consideration: Must the sort be _stable_? With `"abc", "ABC"`, is it OK that the resultant sorted data is of a different order? @Sebivor `qsort()` does not provide that. OP's approach could. – chux - Reinstate Monica Sep 27 '17 at 18:58
  • @chux That which you're unaware of is not necessarily impossible. I suggest that you look up mechanisms to make `qsort` stable. – autistic Sep 28 '17 at 05:47
  • @Sebivor `qsort()` itself is not specified to be stable sort of a data set D. C11 §J.1 1 identifies the unspecified behavior of `qsort()` "The order of two elements that compare as equal in an array sorted by the qsort function". The various mechanisms for stable sort employ _auxiliary_ unique members, structures or arrays A so that a `qsort(A)` results in a stable sort of `D` (the goal). – chux - Reinstate Monica Sep 28 '17 at 13:41
  • @chux Can you find me an *unstable* sort, in a commonly used standard library? That ought to be a pretty difficult challenge... If you can't do that, then can we agree that a stable `qsort` is like a twos complement integer representation? It may not be a guarantee, but it's **EXTREMELY COMMON** and as a result, most likely... – autistic Sep 28 '17 at 18:14
  • @chux In fact, if your standard library doesn't give you a stable sort and you need one, it's time to get a new standard library. Again, there's that whole "choice" thing we discussed earlier... – autistic Sep 28 '17 at 18:16
  • @Sebivor I choose not rely on `qsort()` to providing a stable sort as C does not specify that functionally. For portably, if one needed a stable sort, I would employ another function. Lots of sources available for this task -- or a conditional call to the `qsort()` if that implementation provided stability. A wise programmer said "I would rarely settle with anything less than the most portable, standard-compliant solution possible." – chux - Reinstate Monica Sep 28 '17 at 18:41
  • @chux What kind of sort would be both **stable** and **suitable for use with large files**? – autistic Sep 29 '17 at 06:05
  • @WeatherVane I think you meant `stricmp`; as I'm sure will be made aware to you in a *d'oh* moment, `strncmp` is something entirely different... ;) – autistic Sep 29 '17 at 06:20

1 Answers1

5

If your system has POSIX strcasecmp function available, replace strcmp with strcasecmp for a drop-in replacement.

Otherwise, implement case-insensitive string comparison in your own code, and replace the call of strcmp with it. You can implement the required functionality by comparing strings character-by-character after converting characters on both sides to the same case (upper or lower).

Note 1: Your algorithm has an inefficiency: you allocate all your strings at max length to avoid undefined behavior when the strings are of unequal length. You can allocate strings at exact length and avoid undefined behavior, because you do not need copying the content at all: swapping string pointers will do the job, and it would be a lot faster.

if(strcasecmp(strData[j], strData[j+1]) > 0) {
    char *tmp = strData[j];
    strData[j] = strData[j+1];
    strData[j+1] = tmp;
}

Note 2: Bubble sort algorithm is pretty slow, unless the array is nearly sorted to begin with. For larger files, you can improve the speed considerably by using qsort.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 1
    Nice. Another significant performance enhancement might be reducing the number of calls to `calloc` by combining multiple allocations into one, though that's likely out of the scope for this question. – autistic Sep 27 '17 at 17:23
  • 2
    `qsort()` is a good approach, yet may loose [sort stability](https://stackoverflow.com/questions/584683/stabilizing-the-standard-library-qsort). OP's code appears stable, if slow. Unknown if that is important to to OP. – chux - Reinstate Monica Sep 27 '17 at 19:06
  • I suppose `qsort` may [*lose*](https://www.google.com/search?q=define%3A+lose&oq=define%3A+lose) sort stability if you choose a standard library which doesn't have a stable sort. That's out of the scope for this question, though choice of standard library is nonetheless *a choice*, just like choice of stable sort function, and just like the *reinvent the wheel unnecessarily* choice. -drops mic- – autistic Sep 28 '17 at 05:52
  • Suggesting `qsort()` is a good idea, yet does impact OP's sort functionality. The stability of `qsort()` is unspecified behavior. To choose to rely on `qsort()` as stable, with a C library that provided it, reduces code portability. Portable code would assume `qsort()` is unstable, yet if that functionality is needed, use additional code/data to insure stability. – chux - Reinstate Monica Sep 28 '17 at 13:53
  • It's a good thing we can choose a different standard library, then, huh?! – autistic Sep 28 '17 at 18:18