16

I'm wondering what's the difference between char s[] = "hello" and char *s = "hello".

After reading this and this, I'm still not very clear on this question.


As I know, there are five data segments in memory, Text, BSS, Data, Stack and Heap.

From my understanding,

in case of char s[] = "hello":

  1. "hello" is in Text.
  2. s is in Data if it is a global variable or in Stack if it is a local variable.

  3. We also have a copy of "hello" where the s is stored, so we can modify the value of this string via s.

in case of char *s = "hello":

  1. "hello" is in Text.
  2. s is in Data if it is a global variable or in Stack if it is a local variable.
  3. s just points to "hello" in Text and we don't have a copy of it, therefore modifying the value of string via this pointer should cause "Segmentation Fault".

Am I right?

Community
  • 1
  • 1
Galaxy
  • 853
  • 2
  • 11
  • 28
  • I'm not sure but I would not call "data segments" what you called. Segments or segmentation, as far as I know, are a way of organising, logically, the memory, along with the other common variant of "paging"... – nbro Jun 18 '16 at 23:21
  • 3
    There typically are no stack or heap segment. C doesn to even mention "segments" or "sections". It depends on your implementation. – too honest for this site Jun 18 '16 at 23:23
  • 1
    @nbro: None is defined in the C standard. – too honest for this site Jun 18 '16 at 23:24
  • @Olaf I think the idea of data and code segments are important to understand what happens internally. But we should classify them as mutable and immutable values as well. – KRoy Jun 19 '16 at 02:19
  • @shuva In fact, it was an interview question I met in a famous company which uses C as major language. I knew the char array is mutable and the other one is not. But I was not clear when the interviewer asked me where the string was stored. That's why I posted this question. Thank you for you answer. – Galaxy Jun 19 '16 at 02:35
  • @Galaxy As memory is abstraction, you never know where it resides in actual device. The code segment, data segment, stack and heap may be located in processor cache or the RAM or even in hard-drive( paging ). – KRoy Jun 19 '16 at 02:56
  • @shuva Yes, what I need know is in which segment the string is stored. Based on my interview experience, it may not be enough to just answer "in static memory" or "read-only memory". – Galaxy Jun 19 '16 at 03:08

2 Answers2

5

You are right that "hello" for the first case is mutable and for the second case is immutable string. And they are kept in read-only memory before initialization.

In the first case the mutable memory is initialized/copied from immutable string. In the second case the pointer refers to immutable string.

For first case wikipedia says,

The values for these variables are initially stored within the read-only memory (typically within .text) and are copied into the .data segment during the start-up routine of the program.

Let us examine segment.c file.

char*s = "hello"; // string
char sar[] = "hello"; // string array
char content[32];

int main(int argc, char*argv[]) {
        char psar[] = "parhello"; // local/private string array
        char*ps = "phello"; // private string
        content[0] = 1;
        sar[3] = 1; // OK
        // sar++; // not allowed
        // s[2] = 1; // segmentation fault
        s = sar;
        s[2] = 1; // OK
        psar[3] = 1; // OK
        // ps[2] = 1; // segmentation fault
        ps = psar;
        ps[2] = 1; // OK
        return 0;
}

Here is the assembly generated for segment.c file. Note that both s and sar is in global aka .data segment. It seems sar is const pointer to a mutable initialized memory or not pointer at all(practically it is an array). And eventually it has an implication that sizeof(sar) = 6 is different to sizeof(s) = 8. There are "hello" and "phello" in readonly(.rodata) section and effectively immutable.

    .file   "segment.c"
    .globl  s
    .section    .rodata
.LC0:
    .string "hello"
    .data
    .align 8
    .type   s, @object
    .size   s, 8
s:
    .quad   .LC0
    .globl  sar
    .type   sar, @object
    .size   sar, 6
sar:
    .string "hello"
    .comm   content,32,32
    .section    .rodata
.LC1:
    .string "phello"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $64, %rsp
    movl    %edi, -52(%rbp)
    movq    %rsi, -64(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    movl    $1752326512, -32(%rbp)
    movl    $1869376613, -28(%rbp)
    movb    $0, -24(%rbp)
    movq    $.LC1, -40(%rbp)
    movb    $1, content(%rip)
    movb    $1, sar+3(%rip)
    movq    $sar, s(%rip)
    movq    s(%rip), %rax
    addq    $2, %rax
    movb    $1, (%rax)
    movb    $1, -29(%rbp)
    leaq    -32(%rbp), %rax
    movq    %rax, -40(%rbp)
    movq    -40(%rbp), %rax
    addq    $2, %rax
    movb    $1, (%rax)
    movl    $0, %eax
    movq    -8(%rbp), %rdx
    xorq    %fs:40, %rdx
    je  .L2
    call    __stack_chk_fail
.L2:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits

Again for local variable in main, the compiler does not bother to create a name. And it may keep it in register or in stack memory.

Note that local variable value "parhello" is optimized into 1752326512 and 1869376613 numbers. I discovered it by changing the value of "parhello" to "parhellp". The diff of the assembly output is as follows,

39c39
<   movl    $1886153829, -28(%rbp)
---
>   movl    $1869376613, -28(%rbp)

So there is no separate immutable store for psar . It is turned into integers in the code segment.

KRoy
  • 1,290
  • 14
  • 10
  • 1
    I'm curious about the read-only memory. Is it Text, Data or something else? During the interview, I answered the string should be in the read-only memory. However, the interviewer wanted me to figure out it was in which specific segment. – Galaxy Jun 19 '16 at 03:02
  • @Galaxy As wikipedia says readonly section is **typically** `.text` . – KRoy Jun 19 '16 at 03:10
  • I agree on this. It is similar to the third answer in this question [link](http://stackoverflow.com/questions/1704407/what-is-the-difference-between-char-s-and-char-s-in-c). So there should be a string on Text and a copy of it stored at the same place with the array, usually on Stack for local variable? – Galaxy Jun 19 '16 at 03:19
  • @Galaxy yes, it should be copied usually on Stack for local variable. It will be copied in Data segment for global variable. – KRoy Jun 19 '16 at 04:27
  • @Galaxy Something interesting happened the array string is not stored as separate string in text segment but it is set as integer in code segment. You should take a look. – KRoy Jun 19 '16 at 04:56
4

answer to your first question:

char s[] = "hello";

s is an array of type char. An array is a const pointer, meaning that you cannot change the s using pointer arithmetic (i.e. s++). The data aren't const, though, so you can change it.
See this example C code:

#include <stdio.h>

void reverse(char *p){
    char c;
    char* q = p;
    while (*q) q++; 
    q--; // point to the end
    while (p < q) {
        c = *p;
        *p++ = *q;
        *q-- = c;
    }
}

int main(){
    char s[]  = "DCBA";
    reverse( s);
    printf("%s\n", s); // ABCD
}

which reverses the text "DCBA" and produces "ABCD".

char *p = "hello"

p is a pointer to a char. You can do pointer arithmetic -- p++ will compile -- and puts data in read-only parts of the memory (const data).
and using p[0]='a'; will result to runtime error:

#include <stdio.h>
int main(){
    char* s  = "DCBA";  
    s[0]='D'; // compile ok but runtime error
    printf("%s\n", s); // ABCD
}  

this compiles, but not runs.

const char* const s = "DCBA";

With a const char* const, you can change neither s nor the data content which point to (i.e. "DCBE"). so data and pointer are const:

#include <stdio.h>
int main(){
    const char* const s  = "DCBA";  
    s[0]='D'; // compile error
    printf("%s\n", s); // ABCD
}

The Text segment is normally the segment where your code is stored and is const; i.e. unchangeable. In embedded systems, this is the ROM, PROM, or flash memory; in a desktop computer, it can be in RAM.

The Stack is RAM memory used for local variables in functions.

The Heap is RAM memory used for global variables and heap-initialized data.

BSS contains all global variables and static variables that are initialized to zero or not initialized vars.

For more information, see the relevant Wikipedia and this relevant Stack Overflow question

With regards to s itself: The compiler decides where to put it (in stack space or CPU registers).

For more information about memory protection and access violations or segmentation faults, see the relevant Wikipedia page

This is a very broad topic, and ultimately the exact answers depend on your hardware and compiler.

Community
  • 1
  • 1
  • For `char*s = "hello"`, the `"hello"` is kept in readonly section `.rodata` by gcc assembler. It seems `"hello"` here is *immutable* contrary to what is suggested in the explanation. And `s[0] = 1;` causes segmenation fault when `s` refers to `"hello"`. Again `s` is not `const`. It may refer to *mutable* string on *runtime*. So `s[0]` may NOT cause segmentation fault as well. – KRoy Jun 19 '16 at 01:49
  • 2
    In the wiki page [link](https://en.wikipedia.org/wiki/Data_segment) you provided, there is an example of `char string[] = "Hello World";`. And it says "The values for these variables are initially stored within the read-only memory (typically within .text) and are copied into the .data segment during the start-up routine of the program." – Galaxy Jun 19 '16 at 02:07
  • 1
    Great example. `Const` is very useful to let the compiler detect potential "segmentation fault". The interviewer introduced the same method as you. – Galaxy Jun 19 '16 at 02:46