30

I heard that in C, if I do

char *s = "hello world". 

the "hello world" is actually stored in read-only memory.

I am not so clear about read-only memory. What is the explanation? Is that like a flag to the compiler that tells the compiler to do not write into that section?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 1
    Do you have a reference? I think you might mean: const char* s = "hello world". – James Black Nov 10 '09 at 00:29
  • 3
    It is not clear that all processor architectures support protected memory. – jldupont Nov 10 '09 at 00:30
  • 1
    @James Black: The OP is obviously talking about the string literal `"hello world"`, which indeed can be stored in read-only memory, regardless of how the pointer is declared. – AnT stands with Russia Nov 10 '09 at 00:36
  • my reference is: http://stackoverflow.com/questions/1704407/what-is-the-difference-between-char-s-and-char-s-in-c –  Nov 10 '09 at 00:40
  • I am also curious when we declare a constant, ex: const int a; does a get allocated into the read-only memory section too? (because it's constant, not modifiable so I assume so) –  Nov 10 '09 at 00:41

7 Answers7

42

That's not a feature of the C language but a feature of the compiler/linker and the operating system working together.

When you compile such code the following happens:

  • The compiler will put the string into a read-only data-section.

  • The linker collects all the data in such read-only sections and puts them into a single segment. This segment resides in the executable file and is flagged with a "read only"-attribute.

  • Now comes the operating system executable loader. It loads the executable (or maps it into memory to be more exact). Once this is done, the loader walks the sections and sets access-permissions for each segment. For a read-only data segment it will most likely disable code-execute and write access. Code (for example, your functions) gets execute rights but no write access. Ordinary data like static variables gets read and write access and so on...

That's how modern operating systems do it.

As said, it's not a feature of the C language. If you compile the same problem for DOS, for example, the program will run but no write protection would be possible, because the DOS-loader does not know about read-only sections.

Inian
  • 80,270
  • 14
  • 142
  • 161
Nils Pipenbrinck
  • 83,631
  • 31
  • 151
  • 221
  • 2
    Does constant variable also put same section as "hello world"? ( E.g: const int a = 6 ) –  Nov 10 '09 at 01:12
  • "It depends". The only thing you can say for sure about a const int is that the compiler will produce a diagnostic message if you attempt to modify it. It might be stored in a read-only section, or it might never be stored at all, and directly encoded as a constant into the instructions that use it. – Mark Bessey Nov 10 '09 at 01:21
  • @tsubasa - probably not, and certainly not if `a` is a local. But whatever the answer, it will also depend on the OS and the loader. – Stephen C Nov 10 '09 at 01:24
  • Thank you for the answer. @Stephen C, is the loader part of the OS? –  Nov 10 '09 at 01:31
  • tsubasa, yes, the loader is generally a part of the OS. – Prof. Falken Nov 10 '09 at 09:31
  • 1
    Also, remember, the constant might even and up in a ROM. (In the case of embedded program.) – Prof. Falken Nov 10 '09 at 09:31
  • 1
    @Nils Pipenbrinck Is there any possibility to enforce a compiler to put the whole program in *writeable* memory? – phimuemue Jul 28 '11 at 17:27
  • I should note that many platforms place the read only data sections in a read and execute segment to save on space but that some linkers like gold offer options (--rosegment) to change this or one can use linker scripts. – Molly Stewart-Gallus Dec 05 '15 at 17:21
7

Executables contain two parts: a .data section, containing global variables, and a .text section, containing the actual machine code.

Strings are placed into the .data section. What C does when it sees "Hello world" is it puts the string "Hello world" into the executable itself, and replaces instance of "Hello world" in the program with the address where that string ends up being loaded.

Having said that, I'm not sure why it's read-only - theoretically a program should be able to modify its own memory..

Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • 3
    Not all processors and operating systems support self-modifying code. In fact, most modern operating systems contain protection against self-modifying code, as a security feature. – Crashworks Nov 10 '09 at 00:50
  • 1
    String literals, because they don't have to be modifiable, can and often are stored in the text section. – caf Nov 10 '09 at 01:09
6

True read-only memory is implemented by the memory subsystem of the OS. The OS can mark certain pages as read-only.

In the binary, the compiler can tell the OS which parts of the executable should be placed in read-only vs read-write memory pages.

R Samuel Klatchko
  • 74,869
  • 16
  • 134
  • 187
  • 2
    hmmm... true memory protection is implemented at the processor level. – jldupont Nov 10 '09 at 00:33
  • @jldupont: The memory protection is indeed implenmented at the hardware level (in x86 at least), but the initial setup is done by the OS, i.e. it is OS that *marks* read-only pages as such, and then the hardware enforces the read-only marks set up by the OS. – AnT stands with Russia Nov 10 '09 at 00:43
  • @AndreyT: of course... my point was with regards to @R Samuel. Without **hardware** assistance, there is such much one can do at the software level. – jldupont Nov 10 '09 at 00:50
  • @R - actually, on embedded systems read-only memory might actually be implemented using ROM hardware; e.g. using EPROM chips. – Stephen C Nov 10 '09 at 01:28
5

When you write char s[10]="sneha"; you are allocating 10 bytes of storage space(not memory, memory comes into picture only when u r executing your program) in your object file. This is static allocation of memory( at compile time).

But when you write char *s="sneha"; you are not allocating any storage space to store "sneha". It will get stored in READ ONLY section. But the pointer s is stored in different section based on where it is declared. But it is pointing to the READ ONLY DATA "sneha". So if you try to write on it you will get segmentation fault.

For example:

char *s = "sneha";
s[1] = 'N'; 
printf("%s",s);  // you expecting output sNeha, 
                 // but you get a seg fault since it is READ ONLY DATA 
GermanNerd
  • 643
  • 5
  • 12
4

An example of how to do this in Linux is on page 179 of Advanced Linux Programming by Mark Mitchell, Jeffrey Olham, and Alex Samuel.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
andrew cooke
  • 45,717
  • 10
  • 93
  • 143
1

You could try something like

s[4] = '0';

and see if it says "hello w0rld" when you call

puts(s);

If it causes an immediate Segmentation Fault or a Data Execution Prevention exception then it is probably read only. (If the system lets you get away with it, that doesn't make it a good idea.)

Jared Updike
  • 7,165
  • 8
  • 46
  • 72
1

As other folks have mentioned, whether the contents of constant strings are stored in read-only memory is determined by the operating system, compiler, and chip architecture.

More precisely, the C standard specifies that the quoted strings are considered to have "const char[]" type (or words to that effect, I don't have the standard at hand).

Any code that attempts to modify the contents of such a string is invoking undefined behavior. That means that literally anything can happen at that point, and the provider of the compiler isn't even required to document what can happen.

In practice, this means that a C or C++ program that wants to be portable has to avoid modifying constant strings.

In general, the compiler will not allow you to modify the contents of of "const" variables, so you can consider "const" to mean "read only" in most cases. Unfortunately, there's a special exception for char * and const char *, largely for historical reasons. That means that code like this:

char *x = "Hello, World";
*x = 'h';

will compile without error or warning, even though it invokes undefined behavior.

Mark Bessey
  • 19,598
  • 4
  • 47
  • 69