0

I'm trying to understand the black voodoo magic that are pointers and I can't get my head around the following cases. My understanding of the first parameter of getline() is clumsy, so I guess it all comes down to its type, which is different from the word in the second example.

The following is an extract from a function that loads a file (a dictionary of words) and reads its content line by line. Why does tolower() work in this first example:

int l;
size_t len = 0;
char *word = NULL;

while ((l = getline(&word, &len, fp)) != -1)
{
    for (char *p = word; *p; ++p) *p = tolower(*p);
    // Irrelevant code below
}

But segfaults in this second example, right after trying to assign the return of tolower() from the first char:

char *word = "POTATO";
for (char *p = word; *p; ++p) *p = tolower(*p);
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
Ramon Royo
  • 155
  • 1
  • 3
  • 16
  • 1
    Incidentally, when the type of `p` is `char *`, use `tolower((unsigned char) *p)` rather than `tolower(*p)`. The C standard does not define the behavior of `tolower` when `*p` is negative but not `EOF`, and `char` values can be negative in some C implementations. – Eric Postpischil Nov 05 '20 at 00:18

2 Answers2

1

It would be enough to look through the C Standard (6.4.5 String literals)

7 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Firstly, string literals are character arrays with the static storage duration.

That is for example in C the string literal "POTATO" has the type char[7].

Here is a demonstrative program.

#include <stdio.h>

int main(void) 
{
    printf( "sizeof( \"POTATO\" ) = %zu\n", sizeof( "POTATO" ) );
    
    return 0;
}

The program output is

sizeof( "POTATO" ) = 7

Used in expressions arrays with rare exceptions (as for example using as operands of the operator sizeof) are converted to pointers to their first elements.

So in this declaration

char *word = "POTATO";

that (only for a demonstrative purpose) can be rewritten like

char *word = &"POTATO"[0];

the string literal used as an initializer is converted to pointer to its first letter 'P'.

To avoid such a mistake of changing string literals in C++ opposite to C string literals have types of constant character arrays.

Thus in C++ you have to write

const char *word = "POTATO";

It is advisable to declare pointers to string literals in C also with the qualifier const.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
0

Writing my question I decided to recheck the getline() documentation. According to it the first parameter is a char **lineptr. Meaning it's a pointer to a pointer to a char. More specifically:

getline() reads an entire line from stream, storing the address of the buffer containing the text into *lineptr

Due to my still limited understanding of double pointers and pointers in general, I've decided to read the code for getline(), trying to understand what's going on:

https://dev.w3.org/libwww/Library/src/vms/getline.c

Here's what I've understood and my correction to the second example, so that it doesn't segfault.

char word[] = "POTATO";
for (char *p = word; *p; ++p) *p = tolower(*p);

Then trying to really understand why the correction has worked, I've also searched for the differences between char arrays and char pointers and I've found and read the following:

https://overiq.com/c-programming-101/character-array-and-character-pointer-in-c/

What I've learned is that elements of the array can be individually modified. But chars pointers (string literals) cannot.

I thought the latter could be done, but it's not possible. Hence my mistake in the second example. I was trying to modify the values pointed to by a char pointer and got a segfault in return for my lack of Black Mojo understanding.

Now I understand a little better and the trip has been enjoyable.

Please feel free to edit or add your insights.

Ramon Royo
  • 155
  • 1
  • 3
  • 16