4

I posted a question about some pointer issues I've been having earlier in this question: C int pointer segmentation fault several scenarios, can't explain behaviour

From some of the comments, I've been led to believe that the following:

#include <stdlib.h>
#include <stdio.h>
int main(){
   int *p;
   *p = 1;
   printf("%d\n", *p);
   return 0;
}

is undefined behaviour. Is this true? I do this all the time, and I've even seen it in my C course. However, when I do

#include <stdlib.h>
#include <stdio.h>
int main(){
   int *p=NULL;
   *p = 1;
   printf("%d\n", *p);
   return 0;
}

I get a seg fault right before printing the contents of p (after the line *p=1;). Does this mean I should have always been mallocing any time I actually assign a value for a pointer to point to?

If that's the case, then why does char *string = "this is a string" always work?

I'm quite confused, please help!

P. Gillich
  • 289
  • 1
  • 9
  • 2
    The thing about pointers is that you should be sure to point them at something valid and known before dereferencing them. No initialization is no good. Null initialization is no good on most systems. 'I do this all the time', oh..I hope not. 'I've even seen it in my C course' - change course. – Martin James Mar 16 '21 at 02:16
  • 3
    "*I've even seen it in my C course*" You either missed or misread something, otherwise find yourself a better course. "*why does `char *string = "..."` always work*" Because that *is* initializing the pointer to a valid value, the address of a nul-terminated const char array containing `"...\0"`. – dxiv Mar 16 '21 at 02:21
  • @MartinJames If I'm being honest, it's more likely because I didn't understand the importance of first declaring an `int` to point to, rather than going in guns blazing like I did. The poor student strikes once again – P. Gillich Mar 16 '21 at 02:25
  • 2
    "Guns blazing" is a particularly bad strategy for C because those guns are by default pointed at your feet... – Nate Eldredge Mar 16 '21 at 02:46

5 Answers5

7

This:

int *p;
*p = 1;

Is undefined behavior because p isn't pointing anywhere. It is uninitialized. So when you attempt to dereference p you're essentially writing to a random address.

What undefined behavior means is that there is no guarantee what the program will do. It might crash, it might output strange results, or it may appear to work properly.

This is also undefined behaivor:

int *p=NULL;
*p = 1;

Because you're attempting to dereference a NULL pointer.

This works:

char *string = "this is a string" ;

Because you're initializing string with the address of a string constant. It's not the same as the other two cases. It's actually the same as this:

char *string;
string = "this is a string";

Note that here string isn't being dereferenced. The pointer variable itself is being assigned a value.

dbush
  • 205,898
  • 23
  • 218
  • 273
3

Yes, doing int *p; *p = 1; is undefined behavior. You are dereferencing an uninitialized pointer (accessing the memory to which it points). If it works, it is only because the garbage in p happened to be the address of some region of memory which is writable, and whose contents weren't critical enough to cause an immediate crash when you overwrote them. (But you still might have corrupted some important program data causing problems you won't notice until later...)

An example as blatant as this should trigger a compiler warning. If it doesn't, figure out how to adjust your compiler options so it does. (On gcc, try -Wall -O).

Pointers have to point to valid memory before they can be dereferenced. That could be memory allocated by malloc, or the address of an existing valid object (p = &x;).

char *string = "this is a string"; is perfectly fine because this pointer is not uninitialized; you initialized it! (The * in char *string is part of its declaration; you aren't dereferencing it.) Specifically, you initialized it with the address of some memory which you asked the compiler to reserve and fill in with the characters this is a string\0. Having done that, you can safely dereference that pointer (though only to read, since it is undefined behavior to write to a string literal).

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
2

is undefined behaviour. Is this true?

Sure is. It just looks like it's working on your system with what you've tried, but you're performing an invalid write. The version where you set p to NULL first is segfaulting because of the invalid write, but it's still technically undefined behavior.

You can only write to memory that's been allocated. If you don't need the pointer, the easiest solution is to just use a regular int.

int p = 1;

In general, avoid pointers when you can, since automatic variables are much easier to work with.

Your char* example works because of the way strings work in C--there's a block of memory with the sequence "this is a string\0" somewhere in memory, and your pointer is pointing at that. This would be read-only memory though, and trying to change it (i.e., string[0] = 'T';) is undefined behavior.

Stephen Newell
  • 7,330
  • 1
  • 24
  • 28
1

With the line

char *string = "this is a string";

you are making the pointer string point to a place in read-only memory that contains the string "this is a string". The compiler/linker will ensure that this string will be placed in the proper location for you and that the pointer string will be pointing to the correct location. Therefore, it is guaranteed that the pointer string is pointing to a valid memory location without any further action on your part.

However, in the code

int *p;
*p = 1;

p is uninitialized, which means it is not pointing to a valid memory location. Dereferencing p will therefore result in undefined behavior.

It is not necessary to always use malloc to make p point to a valid memory location. It is one possible way, but there are many other possible ways, for example the following:

int i;
int *p;
p = &i;

Now p is also pointing to a valid memory location and can be safely dereferenced.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
0

Consider the code:

#include <stdio.h>
int main(void)
{
  int i=1, j=2;
  int *p;
  ... some code goes here
  *p = 3;
  printf("%d %d\n", i, j);
}

Would the statement *p = 2; write to i, j, or neither? It would write to i or j if p points to that object, but not if p points somewhere else. If the ... portion of the code doesn't do anything with p, then p might happen point to i, or j, or something within the stdout object, or anything at all. If it happens to point to i or j, then the write *p = 3; might affect that object without any side effects, but if it points to information within stdout that controls where output goes, it might cause the following printf to behave in unpredictable fashion. In a typical implementation, p might point anywhere, and there will be so many things to which p might point that it would be impossible to predict all of the possible effects of writing to them.

Note that the Standard classifies many actions as "Undefined Behavior" with the intention that many or even most implementations will extend the semantics of the language by documenting their behavior. Most implementations, for example, extend the meaning of the << operator to allow it to be used to multiply negative numbers by power of two. Even on implementations that extend the language to specify that an assignment like *p = 3; will always perform a word-sized write of the value 3 to the indicated address, with whatever consequence results, there would be relatively few platforms(*) where it would be possible to fully characterize all possible effects of that action in cases where nothing is known about the value of p. In cases where pointers are read rather than written, some systems may be able to offer useful behavioral guarantees about the effect of arbitrary stray reads, but not all(**).

(*) Some freestanding platforms which keep code in read-only storage may be able to uphold some behavioral guarantees even if code writes to arbitrary pointer addresses. Such behavioral guarantees may be useful in systems whose state might be corrupted by electrical interference, but even when targeting such systems writing to a stray pointer would never be useful.

(**) On many platforms, stray reads will either yield a meaningless value without side effects or force an abnormal program termination, but on an Apple II which a Disk II card in the customary slot-6 location, if code reads from address 0xC0EF within a second of performing a disk access, the drive head to start overwriting whatever happens to be on the last track accessed. This is by design (software that needs to write to the disk does so by accessing address 0xC0EF, and having hardware respond to both reads and writes required one less logic gate--and thus one less chip--than would be required for hardware that only responded to writes) but does mean that code must be careful not to perform any stray reads.

supercat
  • 77,689
  • 9
  • 166
  • 211