2

I'm learning C and I have this implementation to sort the files and folders, but this isn't case insensitive:

#include <dirent.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>

int main(void) {
    struct dirent **namelist;
    int n;

    n = scandir(".", &namelist, NULL, alphasort);
    if (n < 0)
        perror("scandir");
    else {
        printf("Inside else, n = %d\n", n);
        while (n--) {
            printf("%s\n", namelist[n]->d_name);
            free(namelist[n]);
        }
        free(namelist);
    }
}

And if I have a.txt, b.txt, C.txt and z.txt it will sort in this order: C.txt, a.txt, b.txt, z.txt. I want this to be sorted case insensitive like this: a.txt, b.txt, C.txt, z.txt

chqrlie
  • 131,814
  • 10
  • 121
  • 189
elvis
  • 956
  • 9
  • 33
  • 56
  • 4
    Then don't use `alphasort`. Write your own comparator function. – kaylum Dec 31 '21 at 11:18
  • 2
    elvis, "I want this to be sorted case insensitive" --> If the filename contains a `'_"`, (an ASCCI charter between the upper and lower case letters), should that portion of the filename sort before letters or after letters? Wanting _case insensitive_ is OK to describe _equality_ (e.g. `a` should equate to `A`), but is insufficient to describe _order_. (is `_` before or after `A` and `a`. – chux - Reinstate Monica Dec 31 '21 at 11:59
  • 1
    @chux-ReinstateMonica: this is indeed a real issue. I wonder if `strcasecmp` is specified as comparing the uppercase or lowercase conversions, or some other alternative giving it non transitive behavior, which is a problem for use as the `qsort` comparison function. – chqrlie Jan 01 '22 at 14:13
  • @chqrlie `strcasecmp()` is not in the standard library. As not in the STL, beginning with `str` + lower case letter violates reserved name space (7.31.12). *nix implementations in the wild I had come across both a fold to lower, and a fold to upper revealing this issue. Yes I am certain the lack of precise definition risks `qsort()` issues and portability. More [thoughts](https://stackoverflow.com/a/31128931/2410359) and a speedy [alternative](https://stackoverflow.com/a/51992138/2410359). – chux - Reinstate Monica Jan 01 '22 at 14:31
  • @chux-ReinstateMonica: `strcasecmp` is defined in POSIX and as you noted, the [manual page](https://pubs.opengroup.org/onlinepubs/009696799/functions/strcasecmp.html) specifies that *in the POSIX locale, `strcasecmp()` and `strncasecmp()` shall behave as if the strings had been converted to lowercase and then a byte comparison performed. The results are unspecified in other locales.* This seems compatible with `qsort` (as long as all pointers have the same representation and calling convention, which is another POSIX requirement. – chqrlie Jan 01 '22 at 14:40
  • @chqrlie note: `*nix` is not only POSIX. IAC, converting to lowercase is not sufficient for a caseless compare when upper/lower case letters, like `é`, lack a 1 to 1 mapping. `strcasecmp()` is not _bad_, just insufficient for high portability/functionality. – chux - Reinstate Monica Jan 01 '22 at 14:46
  • @chux-ReinstateMonica: Touché ! I agree, but relying on C locales for anything but ASCII is bound to fail. the macros in `` cannot handle UTF-8, which is the now de-facto encoding standard. Anything else is of historical value only, along with non 2's complement representations and non 8-bit bytes. Many systems out there still use these curiosities... the world is imperfect. Let's work at improving it, one year at a time :) – chqrlie Jan 01 '22 at 14:55
  • Agree about the pros of UTF8, yet sadly C is not even close to properly embracing it. UTF8 is also why I prefer a non-locale str caseless compare function for `char *`. "with non 2's complement representations" is planned for obsolescence in C2x, so we are _almost_ there. "non 8-bit bytes", hmmm, forcing `CHAR_BIT==8` precludes C from some potential interesting future architectures. I suspect Darwin pressure will force `CHAR_BIT==8` though. – chux - Reinstate Monica Jan 01 '22 at 15:08
  • Neither am I, and I don't particularly like the direction taken in C2x, especially the huge number of new macros and functions to support decimal and larger floats. – chqrlie Jan 01 '22 at 15:14

1 Answers1

3

scandir is defined with this prototype:

int scandir(const char *restrict dirp,
            struct dirent ***restrict namelist,
            int (*filter)(const struct dirent *),
            int (*compar)(const struct dirent **,
                          const struct dirent **));

The function alphasort sorts the filenames in lexicographical order, hence case-sensitive order. If you want case insensitive sorting, use a different comparison function:

int alphasort_no_case(const struct dirent **a, const struct dirent **b) {
    return strcasecmp((*a)->d_name, (*b)->d_name);
}

Both scandir and strcasecmp are POSIX functions: strcasecmp is highly likely to be available on systems that support scandir and defined in <strings.h>.

Modifier version:

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>

int alphasort_no_case(const struct dirent **a, const struct dirent **b) {
    return strcasecmp((*a)->d_name, (*b)->d_name);
}

int main(void) {
    struct dirent **namelist;
    int n;

    n = scandir(".", &namelist, NULL, alphasort_no_case);
    if (n < 0) {
        perror("scandir");
    } else {
        printf("Inside else, n = %d\n", n);
        while (n--) {
            printf("%s\n", namelist[n]->d_name);
            free(namelist[n]);
        }
        free(namelist);
    }
    return 0;
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
chqrlie
  • 131,814
  • 10
  • 121
  • 189