How does indirect manipulation of dynamically allocated string literals truly work in C?

Question

As it stands, I know that dynamically allocated string literals cannot be changed during run-time, otherwise you will hit a segmentation fault.

This is due to the fact that dynamically allocated string literals are stored, from what I saw in the assembly code, in the .rodata segment, placing these literals in what I interpret as read only memory.

So in theory this should never work on what I hope is in every modern C compiler:

#include <stdio.h>
#include <stdlib.h>

int 
main( void )
{
  char *p = ( char * )malloc( 10 * sizeof( char ) );
  if ( p == NULL )
  {
    puts( "Bad memory allocation error!" );
    return( EXIT_FAILURE );
  }
  else
  {
    p = "literal";
    printf( "%s\n", p );
    p[0] = 't'; // <-- This should cause a segmentation fault!
    printf( "%s\n", p ); // <-- This should never reach execution!
  }
  return( EXIT_SUCCESS );
}

However, upon studying how tolower() and toupper() work. I find it rather difficult to understand how these two simple functions are able to do what I thought for a long while was impossible. Here's what I mean:

#include <stdio.h>

int
tolowercase( int c )
{
  return ( c >= 'A' && c <= 'Z' ) ? ( c + 32) : ( c );
}

int
retstrlen( char *str )
{
  int len = 0;
  while( *str != '\0' ) { len++; str++; }
  return( len );
}

int
main( int argc, char **argv )
{
  for( int i = 0; i < argc; i++ )
  {
    for( int j = 0; j < retstrlen( argv[i] ); j++ )
    {
      argv[i][j] = tolowercase( argv[i][j] );
      printf( "%c", argv[i][j] );
    }
    printf( "\n" );
  }
  return 0;
}

How does the source code defined in my custom tolower() function not cause a segmentation fault as it normally would through manipulating dynamically allocated string literals?

My only hypothesis that I can draw is that since tolowercase() has a parameter of int, and a return type of int, then the compiler performs a type conversion which indirectly manipulates **argv.

I am pretty sure I am on the right track about this, yet I could have gotten my whole terminology wrong here, so what is really happening to **argv?

Welcome to Stack Overflow! [Do I cast the result of malloc?](https://stackoverflow.com/q/605845/2173917) — Sourav Ghosh, May 26 '19 at 06:35
Oh, sorry about type casting malloc() within my source code example, I am used to using the coding practices I read from my text books a long while ago. I am mostly self taught up to this point! — , May 26 '19 at 06:43
You keep using that word. It doesn't mean what you think it does. — EOF, May 26 '19 at 06:50
There’s nothing dynamic about a string literal. It’s the opposite of dynamic. It’s statically defined at compile time. If it was dynamic (as in, modifiable, changing, determined at run time) it couldn’t be in read only memory. Do have a look at definitions so you’ll have the correct understanding of the term. — Sami Kuhmonen, May 26 '19 at 07:08
Thank you! I see what you mean, I now understand what went wrong with my thinking. It was foolish of me to think that pointer p in my first coding example was dynamic. I literally just said a while ago that it was stored in the .rodata segment, or as I have read else where with others having it stored in .text segment of the assembly code. Which means that pointer p cannot be dynamic since it is technically in read only portion of memory in the program. — , May 26 '19 at 07:32
Multiplying by `sizeof(char)` also serves no purpose. `sizeof(char)` is always 1. `sizeof(*p)` may be a good idea as it allows the type of `p` to be changed without having to maintain the size expression but I would not insist on that style in this case where you are specifically dealing with string data. — Clifford, May 26 '19 at 08:16

score 2 · Answer 1 · answered May 26 '19 at 06:38

Two points:

p[0] = 't'; // <-- This should cause a segmentation fault! is not guaranteed, the only thing which is guaranteed is to invoke undefined behavior.

For string literals, from C11, chapter §6.4.5

[...] If the program attempts to modify such an array, the behavior is undefined.
Regarding "How does the source code defined in my custom tolower() function not cause a segmentation fault as it normally would through manipulating dynamically allocated string literals?"

Quoting C11, chapter §5.1.2.2.1

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So, they are not string literals, they are perfectly modifiable.

Actually IMO this answer full of C standard quotes does not answer the question at all. It does not say what is wrong and how to modify the code. — 0___________, May 26 '19 at 07:51
Thank you for telling me about undefined behavior, I literally forgot to a think about that when this question came into my head. The result of segmentation fault on my computer will not be the same on another system. I also realized from useful insight given by P__J__, that I should have copied data over instead of assigning it like I did to pointer p or argv. — , May 26 '19 at 07:53

score 1 · Accepted Answer · answered May 26 '19 at 08:11

I know that dynamically allocated string literals cannot be changed during run-time, [...]

You are starting out with a misconception that when corrected makes the rest of your long question irrelevant. There is no such think as a "dynamically allocated string literals", it is an oxymoron.

When you call malloc and assign its return value to p, then p points to a block of memory on the heap:

char* p = malloc(10) ;

           Heap                      .rodata
         +-------------+             +------------+
         |             |             |            |
         |             |             |            |
         |             |             |            |
         +-------------+             |            |
p +----->+ Alloc block |             |            |
         +-------------+             |            |
         |             |             |            |
         |             |             |            |
         |             |             |            |
         |             |             |"literal"   |
         |             |             |            |
         +-------------+             +------------+

When you reassign p to the literal string, you change it to point to the string in the .rodata segment. It is no longer pointing to the heap and you have lost any reference to that block and caused a memory leak; the alloc block can no longer be released back to the heap

p = "literal"

            Heap                      .rodata
         +-------------+             +------------+
         |             |             |            |
         |             |             |            |
         |             |             |            |
         +-------------+             |            |
 p +-+   | Alloc block |             |            |
     |   +-------------+             |            |
     |   |             |             |            |
     |   |             |             |            |
     |   |             |             |            |
     |   |             |       +---->+"literal"   |
     |   |             |       |     |            |
     |   +-------------+       |     +------------+
     |                         |
     |                         |
     +-------------------------+

Moreover calling free(p) (which you have omitted to do in any case) will fail because p is no longer a pointer to a dynamically allocated block.

What you should do rather is copy the string literal to teh dynamically allocated memory:

char *p = malloc( MAX_STR_LEN + 1 ) ;
strncpy( p, "literal", MAX_STR_LEN ) ;

Then the memory looks like this:

                     Heap                      .rodata
          +-------------+             +------------+
          |             |             |            |
          |             |             |            |
          |             |             |            |
          +-------------+   strncpy() |            |
p +------>+ "literal"   +<---------+  |            |
          +-------------+          |  |            |
          |             |          |  |            |
          |             |          |  |            |
          |             |          |  |            |
          |             |          +--+"literal"   |
          |             |             |            |
          +-------------+             +------------+

Now p points to a copy of the literal string, but no no-longer a literal string, but _variable_data, and is modifiable.

Critically p has not changed, only the data pointed to by p has been changed. You have maintained control of the alloc block and can release it back to the heap with `free(p).

Ah! I see what you mean! I was just about to post my own answer to my question, where I fixed my examples up! I was going to add free in my answer, since I was very surprised that no other answer or response called out that I did not attempt to free pointer p at all. You are also the first to use a detailed ASCII drawing to help explain what I did wrong. Thanks! — , May 26 '19 at 08:20
Well in your example `main()` runs to completion and any allocated memory will be returned to the OS in any case, so it is arguably not an issue, but not perhaps good practice. In a less contrived example it would be an issue. Credit to http://asciiflow.com/ — Clifford, May 26 '19 at 08:24

score 0 · Answer 3 · answered May 26 '19 at 07:15

0

There are no dynamically allocated string literals in C.

 p = "literal";

In this line in your code you overwrite the value stored in tho pointer with the reference to the string literal. The memory allocated by the malloc is lost. Then you try to modify the string literal and this is an Undefined Behavior.

You need to copy it instead

strcpy(p, "literal");

answered May 26 '19 at 07:15

0___________

60,014
4
34
74

Yeah, I realize that now, my thinking was very jumbled up! Thank you for reminding me to **copy** the data with strcpy(). I have no idea why I kept saying that pointer p was _dynamic_ even though I stated a couple of sentences ago in my post that the pointer itself was in **read only memory**, thus it could not even be dynamic in the first place. Also thank you for telling me that dynamically allocated string literals do not exist in C, that is something new I did not know before! – May 26 '19 at 07:45

score 0 · Answer 4 · answered May 26 '19 at 09:03

Thank you all for helping me understand where I went wrong, let me fix up the examples so that they are finally correct!

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int
main( void )
{
  const int STR_MAX_LEN = 10;
  char *p = malloc( sizeof *p * STR_MAX_LEN );
  if ( p == NULL )
  {
    puts( "Bad memory allocation error!" );
    return EXIT_FAILURE;
  }
  else
  {
    strncpy( p, "literal", STR_MAX_LEN );
    printf( "%s", p );
    strncpy( p, "test", STR_MAX_LEN );
    printf( "%s", p);
    free( p );
    p = NULL;
  }
  return EXIT_SUCCESS;
}

#include <stdio.h>
#include <ctype.h>

char
*strlower( char *str )
{
  char *temp = str;
  while ( *temp )
  {
    *temp = tolower( ( unsigned char )*temp );
    temp++;
  }
  return str;
}

int
main( int argc, char **argv )
{
  for( int i = 0; i < argc; i++ )
  {
    strlower( argv[i] );
    printf( "%s\n", argv[i] );
  }
  return 0;
}

If there are any other things I should consider from my answer, please let me know, and thank you all for such wonderful advice, and lessons about the C language!

How does indirect manipulation of dynamically allocated string literals truly work in C?

4 Answers4