You have a list of words and you want to output them sorted, and only the unique ones. And you want to do it in a case insensitive fashion.
- Get all the strings to the same case.
- Sort the list of strings.
- Don't output repeats.
C has no built in function to lower case a string, but it does have ones to lower case characters: tolower. So we write a function to lower case a whole string by iterating through it and lower casing each character.
void str_lower(char *str) {
for( ; str[0] != NULL; str++ ) {
str[0] = (char)to_lower(str[0]);
}
}
Then we need to sort. That's handled by the built in qsort function. To use it, you need to write a function that compares two strings and returns just like strcmp. In fact, your comparison function will just be a wrapper around strcmp
to make qsort
happy.
int compare_strings( const void *_a, const void *_b ) {
/* The arguments come in as void pointers to the strings
and must be cast. Best to do it early. */
const char **a = (const char **)_a;
const char **b = (const char **)_b;
/* Then because they're pointers to strings, they must
be dereferenced before being used as strings. */
return strcmp(*a, *b);
}
In order to handle any data type, the comparison function takes void pointers. They need to be cast back into char pointers. And it's not passed the string (char *
) it's passed a pointer to the string (char **
), again so it can handle any data type. So a
and b
need to be dereferenced. That's why strcmp(*a, *b)
.
Calling qsort
means telling it the array you want to sort, the number of items, how big each element is, and the comparison function.
qsort( strings, (size_t)num_strings, sizeof(char*), compare_strings );
Get used to this sort of thing, you'll be using it a lot. It's how you work with generic lists in C.
The final piece is to output only unique strings. Since you have them sorted, you can simply check if the previous string is the same as the current string. The previous string is strings[i-1]
BUT be sure not to try to check strings[-1]
. There's two ways to handle that. First is to only do the comparison if i < 1
.
for( int i = 0; i < num_strings; i++ ) {
if( i < 1 || strcmp( strings[i], strings[i-1] ) != 0 ) {
puts(strings[i]);
}
}
Another way is to always output the first string and then start the loop from the second.
puts( strings[0] );
for( int i = 1; i < num_strings; i++ ) {
if( strcmp( strings[i], strings[i-1] ) != 0 ) {
puts(strings[i]);
}
}
This means some repeated code, but it simplifies the loop logic. This trade-off is worth it, complicated loops mean bugs. I botched the check on the first loop myself by writing if( i > 0 && strcmp ...
)`.
You'll notice I'm not working with argv
... except I am. strings
and num_strings
are just a bit of bookkeeping so I didn't always have to remember to start with argv[1]
or use argv+1
if I wanted to pass around the array of strings.
char **strings = argv + 1;
int num_strings = argc-1;
This avoids a whole host of off-by-one errors and reduces complexity.
I think you can put the pieces together from there.