2

I've been reviewing a program that capitalises the first letter of every word in a string. For example, "every single day" becomes "Every Single Day".

I don't understand the part str[i - 1] == ' '. What does that do?

#include <stdio.h>

char    *ft_strcapitalize(char *str)
{
    int i;

    i = 0;
    while (str[i] != '\0')
    {
        if ((i == 0 || str[i - 1] == ' ') &&
                (str[i] <= 'z' && str[i] >= 'a'))
        {
            str[i] -= 32;
        }
        else if (!(i == 0 || str[i - 1] == ' ') &&
                (str[i] >= 'A' && str[i] <= 'Z'))
        {
            str[i] += 32;
        }
        i++;
    }
    return (str);
}

int   main(void)
{
  char str[] = "asdf qWeRtY ZXCV 100TIS";

  printf("\n%s", ft_strcapitalize(str));
  return (0);
}
Ryad Mney
  • 31
  • 1
  • 3

6 Answers6

1

i is the index in the string of the current character you are thinking about capitalising (remembering it starts at 0).

i-1 is the index in the string of the previous character to the one you are considering.

str[i-1] is the character in the position previous to the one you are considering.

== ' ' is comparing that character to a space character.

So str[i-1] == ' ' means "Is the character to the left of this one a space?"

Oddthinking
  • 24,359
  • 19
  • 83
  • 121
  • @RyadMney consider that str[i-1] is dangerous: when i==0, it is an error. In fact, this test is put after an OR clause "i==0", so it is not executed in case i is 0. – linuxfan says Reinstate Monica Jun 12 '20 at 07:47
  • @linuxfansaysReinstateMonica `str[i-1]` *can* be dangerous indeed, but in this case it isn't, since `str[i-1]` is only evaluated if `i == 0` fails, as you already pointed out yourself. Why you make the heat then? ( not be consider in an offense tone :-) ) – RobertS supports Monica Cellio Jun 12 '20 at 07:54
  • @RobertSsupportsMonicaCellio I don't want to heat. You answer is right and concise, but if the OP doesn't understand "str[i-1]", maybe that adding that in other contexts "str[i-1]" can be dangerous it seemed an improvement to me. – linuxfan says Reinstate Monica Jun 12 '20 at 07:58
  • @linuxfansaysReinstateMonica That is not even my answer. But I am a little sneaky now and added that to [my answer](https://stackoverflow.com/a/62339704/12139179). ;-) – RobertS supports Monica Cellio Jun 12 '20 at 08:03
  • 1
    Sorry @RobertSsupportsMonicaCellio, didn't noticed that! You're welcome (and my little bit of text is free of copyrights). – linuxfan says Reinstate Monica Jun 12 '20 at 08:07
1

"What does str[i - 1] == ' ' mean?"

' ' is a character constant for the white space character (ASCII value 32).

str is a pointer to char in the caller. (Practically thinking, it should point to an array of char with a string inside of it, not just a single char).

i is a counter.


Note that the C syntax allows that you can use array notation for pointers. Thus, str[1] is equal to *(str + 1).

The [i - 1] in str[i - 1] means that you access the element before the element str[i] is pointing to.

The element str[i - 1] is pointing to, is compared to the white space character (If the element str[i - 1] is pointing to actually contains white space).

The condition evaluates to true if this is the case, else the condition is false.


Side Notes:

  • Note that str[i - 1] can be dangerous when i == 0. Then you would try to access memory beyond the bounds of the pointed array. But in your case, this is secure since str[i - 1] == ' ' is only evaluated, if i == 0 is not true, thanks to the logical OR ||.

    if ((i == 0 || str[i - 1] == ' ')
    

    So this case is considered in your code.

  • str[i] -= 32; is equivalent to str[i] -= 'a' - 'A';. The latter form can improve readability as the capitalizing nature is brought to focus.

1

It is checking for spaces, or more exactly, the line

if ((i == 0 || str[i - 1] == ' ')

Checks if we are looking at the string beginning or its previous line was a space, that is, to check if a new word was encountered. In the string "every single day", i = 0 at the bold position, and in the next case,
"every single day", i = 6 and str[i-1] is ' ' marking a new word was encountered

0

Here you are comparing str[i-1] with character space, Whose ASCII code is 32.

e.g.

if(str[i-1] == ' ')
{
 printf("Hello, I'm space.\n");
}
else
{
 printf("You got here, into the false block.\n");
}

Execute this snippet and if the comparison yields the value 1 it's true, false otherwise. Put str[] = "Ryan Mney"; and then compare you'll understand, what is happening?

Shubham
  • 1,153
  • 8
  • 20
0

The C-language provides a number of useful character macros that can be used to both make code more portable, and more readable. Although the sample code you are reviewing does not use these macros, please consider using these macros to make your code more portable, more robust, and easier for others to read.

Please use the islower/isupper/isalpha and tolower/toupper macros; these ctype macros make C-language string processing easier to read.

  • islower(ch) - check whether ch is lower case
  • isupper(ch) - check whether ch is upper case
  • isalpha(ch) - check whether ch is alphabetic (lower or upper case)
  • tolower(ch) - convert ch to lower case (if it is alphabetic)
  • toupper(ch) - convert ch to upper case (if it is alphabetic)

Yes, they are macros - What is the macro definition of isupper in C?

The C-language provides the 'for' control statement which provides a nice way to express string processing. Simple indexed loops are often written using 'for' rather than 'while'.

#include <ctype.h>

char*
ft_strcapitalize(char *str)
{
    for( int i=0; (str[i] != '\0'); i++ )
    {
        if ((i == 0 || isspace(str[i - 1])) && islower(str[i]) )
        {
            str[i] = toupper(str[i]);
        }
        else if (!(i == 0 || str[i - 1] == ' ') && isupper(str[i]) )
        {
            str[i] = tolower(str[i]);
        }
    }
     return (str);
}

A slight refactoring makes the code a bit more readable,

char*
ft_strcapitalize(char *str)
{
    for( int i=0; (str[i] != '\0'); i++ )
    {
        if( (i == 0 || isspace(str[i - 1])) )
        {
            if( islower(str[i]) ) str[i] = toupper(str[i]);
        }
        else if( !(i == 0 || isspace(str[i - 1]) )
        {
            if( isupper(str[i]) ) str[i] = tolower(str[i]);
        }
    }
    return(str);
}

Alternately, use isalpha(ch),

char*
ft_strcapitalize(char *str)
{
    for( int i=0; (str[i] != '\0'); i++ )
    {
        if( (i == 0 || isspace(str[i - 1])) )
        {
            if( isalpha(str[i]) ) str[i] = toupper(str[i]);
        }
        else if( !(i == 0 || isspace(str[i - 1]) )
        {
            if( isalpha(str[i]) ) str[i] = tolower(str[i]);
        }
    }
    return(str);
}

Simplify the conditional expression even further, by performing the special case (first character of string) first.

char*
ft_strcapitalize(char *str)
{
    if( islower(str[0]) ) str[0] = toupper(str[0]);

    for( int i=1; (str[i] != '\0'); i++ )
    {
        if( isspace(str[i - 1]) )
        {
            if( islower(str[i]) ) str[i] = toupper(str[i]);
        }
        else if( !isspace(str[i - 1]) )
        {
            if( isupper(str[i]) ) str[i] = tolower(str[i]);
        }
    }
    return(str);
}

Again, the alternate isalpha(ch) version,

char*
ft_strcapitalize(char *str)
{
    if( isalpha(str[0]) ) str[0] = toupper(str[0]);

    for( int i=1; (str[i] != '\0'); i++ )
    {
        if( isspace(str[i - 1]) )
        {
            if( isalpha(str[i]) ) str[i] = toupper(str[i]);
        }
        else if( !isspace(str[i - 1]) )
        {
            if( isalpha(str[i]) ) str[i] = tolower(str[i]);
        }
    }
    return(str);
}

Even more idiomatic, just use a 'state' flag that indicates whether we should fold to upper or lower case.

char*
ft_strcapitalize(char *str)
{
    int first=1;
    for( char* p=str; *p; p++ ) {
        if( isspace(*p) ) {
            first = 1;
        }
        else if( !isspace(*p) ) {
            if( first ) {
                if( isalpha(str[i]) ) str[i] = toupper(str[i]);
                first = 0;
            }
            else {
                if( isalpha(str[i]) ) str[i] = tolower(str[i]);
            }
        }
    }
    return(str);
}

And your main test driver,

int   main(void)
{
    char str[] = "asdf qWeRtY ZXCV 100TIS";

    printf("\n%s", ft_strcapitalize(str));
    return (0);
}
ChuckCottrill
  • 4,360
  • 2
  • 24
  • 42
  • This looks more like an answer that belongs on codereview.stackexchange.com – Oddthinking Jun 12 '20 at 06:43
  • 1
    `isspace(str[i])` is not the same as `str[i] == ' '`. It matches other characters too: TAB, newline, formfeed etc. `isblank(str[i])` would be closer but still not the same. Furthermore, all functions from `` have undefined behavior for negative values different from `EOF`, so the `char` argument `str[i]` should be cast as `(unsigned char)str[i]` when passed to `isspace()`, `islower()`, `toupper()`, `tolower()` etc. – chqrlie Jun 12 '20 at 08:17
0

' ' is a character constant representing the value of the space character in the execution set. Using ' ' instead of 32 increases both readability and portability to systems where space might not have the same value as in the ASCII character set. (i == 0 || str[i - 1] == ' ') is true if i is the offset of the beginning of a word in a space separated list of words.

It is important to try and make the as simple and readable as possible. Using magic constants like 32 is not recommended when a more expressive alternative is easy and cheap. For example you convert lowercase characters to uppercase with str[i] -= 32: this magic value 32 (again!) happens to be the offset between the lowercase and the uppercase characters. It would be more readable to write:

    str[i] -= 'a' - 'A';

Similarly, you wrote the range tests for lower case and upper case in the opposite order: this is error prone and surprising for the reader.

You are also repeating the test for the start of word: testing for lower case only at the start of word and testing for upper case otherwise makes the code simpler.

Finally, using a for loop is more concise and less error prone than the while loop in your function, but I known that the local coding conventions at your school disallow for loops (!).

Here is a modified version:

#include <stdio.h>

char *ft_strcapitalize(char *str) {
    size_t i;

    i = 0;
    while (str[i] != '\0') {
        if (i == 0 || str[i - 1] == ' ') {
            if (str[i] >= 'a' && str[i] <= 'z') {
                str[i] -= 'a' - 'A';
            }
        } else {
            if (str[i] >= 'A' && str[i] <= 'Z') {
                str[i] += 'a' - 'A';
            }
        }
        i++;
    }
    return str;
}

int main(void) {
    char str[] = "asdf qWeRtY ZXCV 100TIS";

    printf("\n%s", ft_strcapitalize(str));
    return 0;
}

Note that the above code still assumes that the letters form two contiguous blocks in the same order from a to z. This assumption holds for the ASCII character set, which is almost universal today, but only partially so for the EBCDIC set still in use in some mainframe systems, where there is a constant offset between cases but the letters from a to z do not form a contiguous block.

A more generic approach would use functions and macros from <ctype.h> to test for whitespace (space and other whitespace characters), character case and to convert case:

#include <ctype.h>

char *ft_strcapitalize(char *str) {
    for (size_t i = 0; str[i] != '\0'; i++) {
        if (i == 0 || isspace((unsigned char)str[i - 1]))
            str[i] = toupper((unsigned char)str[i]);
        else
            str[i] = tolower((unsigned char)str[i]);
    }
    return str;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189