1

I have the following program that wants to modify s such that we can print out "Hello World!" at the end

#include <stdio.h>

// modify this function
void function(char** c)
{
    *c = "Hello World!";
}

int main()
{
    char* s;
//    function(&s); 
    function(s);
    puts(s);
    return 0;
}

Normally, we would just do function(&s). However, my question is why can't we just use function(s)? Of course, doing so will raise warning during the compile time but since s contains the memory address say 0xab. If we modify the content on 0xab from 0x00 to "hello world!", the address hold by s won't change and we should still see "Hello World!" message.

I'm wondering why function(s) won't work in this case? I compile the program on Mac.

Reference:

xxks-kkk
  • 2,336
  • 3
  • 28
  • 48
  • look up "what is pass by value" – M.M Sep 26 '18 at 01:10
  • @M.M: That is not what they are asking about. They know `s` is passed by value. They are asking that, since it has (in their mind) **some** value, why isn’t the address of `"Hello World!"` written to the address specified by that value, after which `puts(s)` would write “Hello World!”. (Although they may still be off by a reference/dereference in that thinking.) – Eric Postpischil Sep 26 '18 at 01:12
  • 1
    When you pas the uninitialized `s` to `function`, `s` could contain any value. When `function` executes, it is therefore writing to some unknown address. There's no telling what damage that could cause. It could write to the stack, it could write to static storage, it could cause a segmentation fault. It's undefined behavior and should always be avoided. – Tom Karzes Sep 26 '18 at 01:16
  • @EricPostpischil there is a different level of indirection, even in this scenario (supposing you are right about what's going on in their mind) `puts(s)` would try to puts 4(?) bytes representing the address of "Hello, world"` – M.M Sep 26 '18 at 01:17
  • @M.M: Yes. And I have updated my answer to say that. – Eric Postpischil Sep 26 '18 at 01:17
  • is the title meant to say "why can't I..." ? – M.M Sep 26 '18 at 01:22
  • @TomKarzes I agree it's a dangerous practice. I think the reason we aren't passing the pointer is two-fold: 1. being the undefined behavior. However, I think there should be another explanation that I can see `*c` contains valid characters but once the `function` returned, `s`'s value becomes unusual through gdb even the address that `s` holds is unchanged. – xxks-kkk Sep 26 '18 at 03:24
  • @M.M Could you please elaborate more "4 bytes representing the address of "Hello World!""? Why `puts` would not show the message, which `puts` sucessfully does if we pass `&s`? – xxks-kkk Sep 26 '18 at 03:27
  • 1
    @zack please see Eric's answer for elaboration. Also all of this is undefined behaviour so you should not talk about "successfully" and "would" – M.M Sep 26 '18 at 03:40
  • @M.M, If I understand your comment correctly, can I say even doing so can print out "Hello World!" message, it is still considered to be wrong due to the undefined behavior? – xxks-kkk Sep 26 '18 at 03:41
  • 2
    @zack yes this is all wrong – M.M Sep 26 '18 at 03:42
  • @zack there's no guarantee your function will even return. What if you end up clobbering the return address? The behavior could change upon any new compiler release, or any new library release. It's a waste of time to even try to predict what it might do. It's a bug. Just fix it and move on. – Tom Karzes Sep 26 '18 at 04:28

2 Answers2

0

Since s is uninitialized, its contents are unknown (and using it is invalid according to the C standard). Suppose it did contain some value and the C implementation did pass that value to function for the parameter c. Then function attempts to write the address of "Hello World" to the place where c points. But where is that place?

We supposed s contained some value. But it is quite likely an address that is not mapped in your address space. Your small program likely does not use much of even a 32-bit address space, so most of that space will not be mapped to real memory by the operating system. So, if you pick a random address and try to write there, it is probably an invalid address, and your process will crash.

Another likely possibility is that s happens to contain zero because this is early in your program and nothing has written anything else to the place where the compiler put s, so it just contains the zeros that the operating system initialized your memory with. In many systems, the address zero is deliberately left unmapped in address spaces just for this purpose, so that uses of uninitialized pointers will crash.

More than that, a good compiler will see that s is used without being initialized and will warn you about that. If you force it to generate code anyway, the optimizer may, as a a result of its usual transformations, completely replace this broken code with something else.

If you are unlucky, then the uninitialized s will contain a value that happens to be a valid address in your address space, and then function may write the address of "Hello World!" into it. Now you are writing data into some place in your process that may be needed for another purpose, so it can break your program in a variety of ways. Note that this does not give the result you seem to think it would, that puts would write “Hello World!”. If function did write the address of "Hello World!" into *c, the address would be in memory at the place s happens to point to. Then you are passing to puts the address of a place where there is an address. However, puts expects the address of a place where there are characters. It will read the bytes of the address of "Hello World!" and print them, until it reaches a zero byte. Most often, the result is unprintable or at least unusual characters.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Thanks for the response. In my local environment, I can see that `"Hello World!"` message get successfully written into the address contains `*c`. However, once return from `function`, `s` contains unusual values. I'm wondering what happens at that time point? – xxks-kkk Sep 26 '18 at 03:08
  • Also, I'm a little confused about "`puts` the address of a place where there is an address": `s` now contains an address which has characters, which seems to be fine logically. – xxks-kkk Sep 26 '18 at 03:21
  • @immibis I'm sorry, I cannot understand your comment (the first sentence and fail to understand the connection between your example and mine) – xxks-kkk Sep 26 '18 at 03:28
  • @immibis Also, in your example, `m` still holds the undefined value, which is unchanged before and after the function call. This is the same behavior as mine. However, in my example, I modify the content on the address that hold by `s`. After `function`, the address (say `0xab`) in `s` is unchanged but the content on `0xab` is modified to the string. I'm wondering why doing so, `s` can have weird values both under gdb and `puts`. – xxks-kkk Sep 26 '18 at 03:36
  • I don't think your example is close to mine. If I were to modify your example, I would change `x="5"` to `*x = "5"`. – xxks-kkk Sep 26 '18 at 03:43
  • @immibis Yes. I'm asking exactly the code I provided in the question. – xxks-kkk Sep 26 '18 at 03:45
0

s is not initialized, so it holds some garbage address (probably an invalid one).

When you do *s = "Hello World!"; you are writing "Hello World!" (which is a pointer value) to some garbage address (probably an invalid one).

Let's say it doesn't crash though - then puts will read the bytes from that same garbage address (i.e. it will read the address of the string, not the string) and display them on the screen.

After running the incorrect code the memory might contain these values for example:

Address      Value (4 bytes at a time)
...
0x12345678   0x65401234      <- some important thing you just overwrote that is liable to make your program crash,
                                now it holds the address of the string literal
...
0x4000000C   0x12345678      <- variable 's' in main
0x40000010   0x12345678      <- variable 's' in function, copy of variable 's' in main
...
0x65401234   'H', 'e', 'l', 'l'  <- where the compiler decided to put the string literal
0x65401238   'o', ' ', 'W', 'o'
0x6540123C   'r', 'l', 'd', '!'
0x65401240   0

When you call puts(s); you would be calling puts(0x12345678); and it would print the bytes 0x65401234 (but it wouldn't print "0x65401234", it'd try to print the letters corresponding to those)

If you do it right, you end up with:

Address      Value (4 bytes at a time)
...
0x4000000C   0x65401234      <- variable 's' in main
0x40000010   0x4000000C      <- variable 's' in function, has address of variable 's' in main
...
0x65401234   'H', 'e', 'l', 'l'  <- where the compiler decided to put the string literal
0x65401238   'o', ' ', 'W', 'o'
0x6540123C   'r', 'l', 'd', '!'
0x65401240   0

Then puts(s) is puts(0x65401234) which prints the string.

user253751
  • 57,427
  • 7
  • 48
  • 90