96

I had been writing things like

char *x=NULL;

on the assumption that

 char *x=2;

would create a char pointer to address 2.

But, in The GNU C Programming Tutorial it says that int *my_int_ptr = 2; stores the integer value 2 to whatever random address is in my_int_ptr when it is allocated.

This would seem to imply that my own char *x=NULL is assigning whatever the value of NULL cast to a char is to some random address in memory.

While

#include <stdlib.h>
#include <stdio.h>

int main()
{
    char *x=NULL;

    if (x==NULL)
        printf("is NULL\n");

    return EXIT_SUCCESS;
}

does, in fact, print

is NULL

when I compile and run it, I am concerned that I am relying on undefined behavior, or at least under-specified behavior, and that I should write

char *x;
x=NULL;

instead.

fagricipni
  • 927
  • 1
  • 7
  • 9
  • 76
    There's a very confusing difference between what `int *x = whatever;` does and what `int *x; *x = whatever;` does. `int *x = whatever;` actually behaves like `int *x; x = whatever;`, not `*x = whatever;`. – user2357112 Apr 11 '17 at 06:19
  • 80
    This tutorial appears to have gotten that confusing distinction wrong. – user2357112 Apr 11 '17 at 06:23
  • 1
    Sorry, I drew conclusion too early. You are right, well spotted. +1, this is a useful question, thanks for asking. – Sourav Ghosh Apr 11 '17 at 06:24
  • 1
    Ah nvm, indeed thsis tutorial has got it completely backwards :D – Antti Haapala -- Слава Україні Apr 11 '17 at 06:52
  • 52
    So many shitty tutorials on the web! Stop reading immediately. We really need a SO blacklist where we can publicly shame crappy books... – Lundin Apr 11 '17 at 06:56
  • 1
    @Lundin Most publications have a few errors (in fact it would be unusual for one to be completely error free) , so I wouldn't judge based on one case; in fact this case is a good candidate for being some kind of typographical error , as the way the text is written suggests that the author intended to draw distinction between initialization and assignment – M.M Apr 11 '17 at 07:03
  • 3
    @M.M The whole linked page is complete crap, not only is the author missing the blatant constraint violation issue, but they also don't mention that the "garbage values" could as well be trap representations. And they don't mention that a pointer and `int` may have different sizes and representations. – Lundin Apr 11 '17 at 07:18
  • 2
    @Lundin yeah well, it's pretty clear that it's only meant to apply to using gcc on 1980s GNU/Linux systems – M.M Apr 11 '17 at 07:20
  • 9
    @M.M Which doesn't make it less crappy in the year 2017. Given the evolution of compilers and computers since the 80s, it's basically the same thing as if I were a doctor and reading medicine books written during the 18th century. – Lundin Apr 11 '17 at 08:12
  • 5
    @Lundin: Do a whitelist, its actually orders of magnitude smaller, and oh, we have it (http://stackoverflow.com/questions/562303/) which also contains some top blacklist contenders... – PlasmaHH Apr 11 '17 at 09:03
  • @PlasmaHH I regret contributing to that book list, it should be nuked from the site. Lots of crap there, starting with K&R. It's a bad list and nothing SO should be proud of. – Lundin Apr 11 '17 at 09:18
  • 13
    I don't think this tutorial qualifies as "_The_ GNU C Programming Tutorial"... – marcelm Apr 11 '17 at 10:28
  • 3
    This is a disaster, what kind of information people are putting online for new programmers? hmmm. – Seek Addo Apr 11 '17 at 16:41
  • 3
    @Lundin: I don’t see K&R as ‘crap’, even if it is not a good way to learn contemporary C. – PJTraill Apr 11 '17 at 19:23
  • 2
    @PJTraill If you read it and focus on their coding style and (lack of) program design, you'll realize it was never a good book. It has an undeserved good reputation because 1) for a long while it was the only book and also C canon until C90, and 2) because of nostalgia, Dennis Ritchie worship and other religious reasons. – Lundin Apr 12 '17 at 07:00
  • 2
    @marcelm I agree with the point you're making, but if you follow the "up" links in that page, you get to something clearly *titled* as ["The GNU C Programming Tutorial"](http://www.crasseux.com/books/ctutorial/index.html#Top). The same content can be found in http://markburgess.org/CTutorial/GNU-ctut.pdf, by Burgess and Hale-Evans, which says Copyright 2002 Free Software Foundation. – Joshua Taylor Apr 12 '17 at 18:20
  • 1
    Stop using random tutorials on the internet to learn C, and get yourself a proper, peer-reviewed book. Egads. – Lightness Races in Orbit Apr 13 '17 at 13:59
  • 1
    "crasseux.com" ("crasseux" ~ "filthy" In English) seems to be a junk website. It says "This is an archive of abandonware, mp3s, appz and other junk, enjoy." Best avoid it. – Bludzee Apr 24 '17 at 12:29

8 Answers8

118

Is it possible to initialize a C pointer to NULL?

TL;DR Yes, very much.


The actual claim made on the guide reads like

On the other hand, if you use just the single initial assignment, int *my_int_ptr = 2;, the program will try to fill the contents of the memory location pointed to by my_int_ptr with the value 2. Since my_int_ptr is filled with garbage, it can be any address. [...]

Well, they are wrong, you are right.

For the statement, (ignoring, for now, the fact that pointer to integer conversion is an implementation-defined behaviour)

int * my_int_ptr = 2;

my_int_ptr is a variable (of type pointer to int), it has an address of its own (type: address of pointer to integer), you are storing a value of 2 into that address.

Now, my_int_ptr, being a pointer type, we can say, it points to the value of "type" at the memory location pointed by the value held in my_int_ptr. So, you are essentially assigning the value of the pointer variable, not the value of the memory location pointed to by the pointer.

So, for conclusion

 char *x=NULL;

initializes the pointer variable x to NULL, not the value at the memory address pointed to by the pointer.

This is the same as

 char *x;
 x = NULL;    

Expansion:

Now, being strictly conforming, a statement like

 int * my_int_ptr = 2;

is illegal, as it involves constraint violation. To be clear,

  • my_int_ptr is a pointer variable, type int *
  • an integer constant, 2 has type int, by definition.

and they are not "compatible" types, so this initialization is invalid because it's violating the rules of simple assignment, mentioned in chapter §6.5.16.1/P1, described in Lundin's answer.

In case anyone's interested how initialization is linked to simple assignment constraints, quoting C11, chapter §6.7.9, P11

The initializer for a scalar shall be a single expression, optionally enclosed in braces. The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.

Community
  • 1
  • 1
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • @Random832n They ___are___ wrong. I've quoted the related part in my answer, please correct me if otherwise. Oh, and the emphasis in intentional. – Sourav Ghosh Apr 11 '17 at 12:36
  • "... is illegal, as it involves constraint violation. ... an integer literal, 2 has type int, by definition." is problematic. It sounds like because `2` is an `int`, the assignment is a problem. But it is more than that. `NULL` may also be an `int`, an `int 0`. It is just that `char *x = 0;` is well defined and `char *x = 2;` is not. 6.3.2.3 Pointers 3 (BTW: C does not define a _integer literal_, only _string literal_ and _compound literal_. `0` is an _integer constant_) – chux - Reinstate Monica Apr 11 '17 at 17:52
  • @chux You're very correct, but isn't it `char *x = (void *)0;`, to be conforming? or is it only with other expressions which yields the value `0`? – Sourav Ghosh Apr 11 '17 at 18:11
  • 10
    @SouravGhosh: integer constants with value `0` are special: they implicitly convert to null pointers separately from the usual rules for explicitly casting general integer expressions to pointer types. – Steve Jessop Apr 11 '17 at 18:15
  • 1
    The language described by the *1974 C Reference Manual* did not allow declarations to specify initialization expressions, and the lack of such expressions makes "declaration mirrors use" much more practical. The syntax `int *p = somePtrExpression` is IMHO rather horrid since it looks like it's setting the value of `*p` but it's actually setting the value of `p`. – supercat Apr 12 '17 at 22:07
  • I guess the confusion might arise from char *hello= "hello"; as this does initialize the char * to a block of memory with "hello" in it – Har Apr 13 '17 at 13:28
54

The tutorial is wrong. In ISO C, int *my_int_ptr = 2; is an error. In GNU C, it means the same as int *my_int_ptr = (int *)2; . This converts the integer 2 to a memory address, in some fashion as determined by the compiler.

It does not attempt to store anything in the location addressed by that address (if any). If you went on to write *my_int_ptr = 5;, then it would try to store the number 5 in the location addressed by that address.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 1
    I didn't know that integer to pointer conversion is implementation defined. Thanks for the information. – taskinoor Apr 11 '17 at 06:36
  • 1
    @taskinoor Please note that there's a conversion only in the case you force it by a cast, as in this answer. If not for the cast, the code should not compile. – Lundin Apr 11 '17 at 07:01
  • 2
    @taskinoor: Yes, the various conversions in C are quite confusing. This Q has interesting information on conversions: [C: When is casting between pointer types not undefined behavior?](http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior). – sleske Apr 11 '17 at 08:53
18

To clarify why the tutorial is wrong, int *my_int_ptr = 2; is a "constraint violation", it is code which is not allowed to compile and the compiler must give you a diagnostic upon encountering it.

As per 6.5.16.1 Simple assignment:

Constraints

One of the following shall hold:

  • the left operand has atomic, qualified, or unqualified arithmetic type, and the right has arithmetic type;
  • the left operand has an atomic, qualified, or unqualified version of a structure or union type compatible with the type of the right;
  • the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
  • the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) one operand is a pointer to an object type, and the other is a pointer to a qualified or unqualified version of void, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
  • the left operand is an atomic, qualified, or unqualified pointer, and the right is a null pointer constant; or
  • the left operand has type atomic, qualified, or unqualified _Bool, and the right is a pointer.

In this case the left operand is an unqualified pointer. Nowhere does it mention that the right operand is allowed to be an integer (arithmetic type). So the code violates the C standard.

GCC is known to behave poorly unless you explicitly tell it to be a standard C compiler. If you compile the code as -std=c11 -pedantic-errors, it will correctly give a diagnostic as it must do.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 4
    upvoted for suggesting -pedantic-errors. Though I'll likely use the related -Wpedantic . – fagricipni Apr 11 '17 at 09:09
  • 2
    One exception to your statement that the right operand is not allowed to be an integer: Section 6.3.2.3 says, “An integer constant expression with the value 0, or such an expression cast to type `void *`, is called a null pointer constant.” Notice the second-to-last bullet point in your quote. Therefore, `int* p = 0;` is a legal way to write `int* p = NULL;`. Although the latter is clearer and more conventional. – Davislor Apr 11 '17 at 16:23
  • 1
    Which makes the pathological obfuscation `int m = 1, n = 2 * 2, * p = 1 - 1, q = 2 - 1;` legal too. – Davislor Apr 11 '17 at 16:32
  • @Davislor that's covered by bullet point 5 in the standard quote in this answer (agree that the summary afterwards probably should mention it though) – M.M Apr 11 '17 at 21:15
  • 1
    @chux I believe a well-formed program would need to convert an `intptr_t` explicitly to one of the allowed types on the right-hand side. That is, `void* a = (void*)(intptr_t)b;` is legal by point 4, but `(intptr_t)b` is neither a compatible pointer type, nor a `void*`, nor a null pointer constant, and `void* a` is neither an arithmetic type nor `_Bool`. The standard says the conversion is legal, but not that it is implicit. – Davislor Apr 12 '17 at 05:20
  • @chux `void *a = (intptr_t) b;` is not valid C, try it in a standard-conforming C compiler. 7.20.1.4/1 only says that these types may be used to convert to/from a pointer type and back. But the code `void *a = (intptr_t) b;` does not cause a conversion from `intptr_t` to `void*`, because the right operand of `=` is only converted to the type of the assignment expression (6.1.16.1/2) if the above cited constraints (6.1.16.1/1) are fulfilled. You would have to type `void *a = (void*)(intptr_t)b;` to trigger an explicit conversion. – Lundin Apr 12 '17 at 07:36
  • @chux The standard says that `0` is a null pointer constant (bullet point 5). This doesn’t extend to integers in general. – Davislor Apr 12 '17 at 18:43
15

int *my_int_ptr = 2

stores the integer value 2 to whatever random address is in my_int_ptr when it is allocated.

This is completely wrong. If this is actually written then please get a better book or tutorial.

int *my_int_ptr = 2 defines an integer pointer which points to address 2. You will most likely get a crash if you try to access address 2.

*my_int_ptr = 2, i.e. without the int in the line, stores the value two to whatever random address my_int_ptr is pointing to. Having saying this, you can assign NULL to a pointer when it is defined. char *x=NULL; is perfectly valid C.

Edit: While writing this I didn't know that integer to pointer conversion is implementation defined behavior. Please see the good answers by @M.M and @SouravGhosh for details.

taskinoor
  • 45,586
  • 12
  • 116
  • 142
  • 1
    It is completely wrong because it is a constraint violation, not for any other reason. In particular, this is incorrect: "int *my_int_ptr = 2 defines an integer pointer which points to address 2". – Lundin Apr 11 '17 at 07:02
  • @Lundin: Your phrase _"not for any other reason"_ is itself wrong and misleading. If you fix the type compatibility problem, you are still left with the fact that the tutorial's author is grossly misrepresenting how pointer initialisations and assignments work. – Lightness Races in Orbit Apr 13 '17 at 13:57
15

A lot of confusion about C pointers comes from a very bad choice that was originally made regarding coding style, corroborated by a very bad little choice in the syntax of the language.

int *x = NULL; is correct C, but it is very misleading, I would even say nonsensical, and it has hindered the understanding of the language for many a novice. It makes one think that later on we could do *x = NULL; which is of course impossible. You see, the type of the variable is not int, and the name of the variable is not *x, nor does the * in the declaration play any functional role in collaboration with the =. It is purely declarative. So, what makes a lot more sense is this:

int* x = NULL; which is also correct C, albeit it does not adhere to the original K&R coding style. It makes it perfectly clear that the type is int*, and the pointer variable is x, so it becomes plainly evident even to the uninitiated that the value NULL is being stored into x, which is a pointer to int.

Furthermore, it makes it easier to derive a rule: when the star is away from the variable name then it is a declaration, while the star being attached to the name is pointer dereferencing.

So, now it becomes a lot more understandable that further down we can either do x = NULL; or *x = 2; in other words it makes it easier for a novice to see how variable = expression leads to pointer-type variable = pointer-expression and dereferenced-pointer-variable = expression. (For the initiated, by 'expression' I mean 'rvalue'.)

The unfortunate choice in the syntax of the language is that when declaring local variables you can say int i, *p; which declares an integer and a pointer to an integer, so it leads one to believe that the * is a useful part of the name. But it is not, and this syntax is just a quirky special case, added for convenience, and in my opinion it should have never existed, because it invalidates the rule that I proposed above. As far as I know, nowhere else in the language is this syntax meaningful, but even if it is, it points to a discrepancy in the way pointer types are defined in C. Everywhere else, in single-variable declarations, in parameter lists, in struct members, etc. you can declare your pointers as type* pointer-variable instead of type *pointer-variable; it is perfectly legal and makes more sense.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • `int *x = NULL; is correct C, but it is very misleading, I would even say nonsensical,`... I have to agree to disagree. `It makes one think`....stop thinking, read a C book first, no offense. – Sourav Ghosh Apr 12 '17 at 06:13
  • 7
    @SouravGhosh As a matter of opinion I think that C **should** have been designed so that `int* somePtr, someotherPtr` declares two pointers, in fact, I used to write `int* somePtr` but that leads to the bug you describe. – fagricipni Apr 12 '17 at 10:55
  • 1
    @fagricipni I stopped using the multiple variable declaration syntax because of this. I declare my variables one by one. If I really want them on the same line, I separate them with semi-colons rather than commas. "If a place is bad, don't go to that place." – Mike Nakis Apr 12 '17 at 11:11
  • 2
    @fagricipni Well, if I could have designed linux from scratch, I would have used `create` instead of `creat`. :) The point is, it is how it is and we need to mould ourselves to adapt to that. It all boils down to personal choice at the end of the day, agree. – Sourav Ghosh Apr 12 '17 at 12:06
  • How do you explain `int* y[10];` ? This declares `y` to have type `int*[10]`. – M.M Dec 17 '17 at 20:41
  • 1
    @M.M I can't really explain it, I regard it as one more of those weird things about C. But at the same time, `int *y[10]` does not make things any clearer. – Mike Nakis Dec 17 '17 at 21:12
9

I would like to add something orthogonal to the many excellent answers. Actually, initializing to NULL is far from bad practice and may be handy if that pointer may or may not be used to store a dynamically allocated block of memory.

int * p = NULL;
...
if (...) {
    p = (int*) malloc(...);
    ...
}
...
free(p);

Since according to the ISO-IEC 9899 standard free is a nop when the argument is NULL, the code above (or something more meaningful along the same lines) is legit.

Luca Citi
  • 1,310
  • 9
  • 9
  • 6
    It's redundant to cast the result of malloc in C, unless that C code should also compile as C++. – cat Apr 11 '17 at 12:41
  • You are right, the `void*` is converted as needed. But having code that works with a C and a C++ compiler could have benefits. – Luca Citi Apr 11 '17 at 12:57
  • 1
    @LucaCiti C and C++ are different languages. There are only errors waiting for you if you try to compile a source file written for one using a compiler designed for the other. It's like trying to write C code that you can compile using Pascal tools. – Evil Dog Pie Apr 11 '17 at 15:18
  • 1
    @MikeofSST A common use case of code that must compile as either C or C++ is library header files. You can write two versions and use `#ifdef __cplusplus` to select the language, but that adds failure modes and violates DRY. It’s a lot simpler and safer in those cases to write code that works everywhere. Granted, a call to `malloc()` is unlikely to appear in a header file. Besides, the cast is benign and some people think it improves clarity. – Davislor Apr 11 '17 at 15:51
  • @Davislor I'll take your word for that: it's not a use case that I've seen since about 1990, but that doesn't mean it doesn't still exist. For new projects, I would strongly recommend that the two are treated as completely incompatible languages, as different as F# and APL. How do you guarantee that you pick the correct library? C and C++ have different calling conventions and linking the wrong one would cause all hell to break loose, even if you could get it to build. Sounds like too much risk for too little reward to me. – Evil Dog Pie Apr 11 '17 at 15:58
  • 1
    Good advice. I (try to) always initialize my pointer constants to something. In modern C, this can usually be their final value and they can be `const` pointers declared *in medias res*, but even when a pointer needs to be mutable (like one used in a loop or by `realloc()`), setting it to `NULL` catches bugs where it’s used before it’s set with its real value. On most systems, dereferencing `NULL` causes a segfault at the point of failure (although there are exceptions), whereas an uninitialized pointer contains garbage and writing to it corrupts arbitrary memory. – Davislor Apr 11 '17 at 16:04
  • 1
    Also, it’s very easy to see in the debugger that a pointer contains `NULL`, but can be very difficult to tell a garbage pointer from a valid one. So it’s helpful to ensure that all pointers are always either valid or `NULL`, from the moment of declaration. – Davislor Apr 11 '17 at 16:06
  • @MikeofSST Looking up the first example that comes to mind, the glibc version of ``, line 27 is `#ifdef __cplusplus`. By the C++ standard, all C++ compilers must support `#include `, for example. I rarely if ever see C libraries that *prohibit* calling them from C++ programs. I agree that it is very rare to see anything but the kind of function prototypes, `extern` variable definitions and macros that typically appear in C header files written for both languages, though. More likely, people who learned C++ first just got in the habit. – Davislor Apr 11 '17 at 16:15
  • @MikeofSST So, kind of like why I write `++i;` instead of `i++;` in both languages unless I actually care about the return value. In C++, with arbitrary classes, that saves the compiler from creating an unnecessary temporary object. I just didn’t spare the neurons to do it differently in C. – Davislor Apr 11 '17 at 16:20
  • @cat Exactly. In C, or in C++ with integral and pointer types, it doesn’t matter. But, in C++ with more complex classes and operator overloading, it can make a big difference! So I got in the habit of doing it the way that always works. I suspect that’s why many people cast `void*` return values explicitly. – Davislor Apr 11 '17 at 16:57
  • Can I be pedantic? (Ehm, as if...) The code in this answer has a potential memory leak, because it says `p = malloc(...` without checking if `p` is still `NULL`. Also, the question was about C, so why is this comment thread about C++? – Mr Lister Apr 12 '17 at 15:20
  • You can be pedantic, of course :-) But I think you seem to miss the point. I have written a stub just to show that by setting `p` to `NULL` you can free it even if you take a branch that ends up not touching `p`. If it was not initialized you would get a segfault. Davislor added a couple of points more, e.g. that a NULL is easier to spot during debug. This all applies to c as well. The only c++ specific point is a detail (not the main point) i.e. the cast. – Luca Citi Apr 12 '17 at 19:04
  • The code as it is has no leak simply because it doesn't specify what is in the ellipses :-D But you are right, depending on what you put in the ellipses you need to check `p` before assigning it to the result of the `malloc`. – Luca Citi Apr 12 '17 at 19:08
2

This is correct.

int main()
{
    char * x = NULL;

    if (x==NULL)
        printf("is NULL\n");

    return EXIT_SUCCESS;
}

This function is correct for what it does. It assigns the address of 0 to the char pointer x. That is, it points the pointer x to the memory address 0.

Alternative:

int main()
{
    char* x = 0;

    if ( !x )
        printf(" x points to NULL\n");

    return EXIT_SUCCESS;
}

My guess as to what you wanted is:

int main()
{
    char* x = NULL;
    x = alloc( sizeof( char ));
    *x = '2';

    if ( *x == '2' )
        printf(" x points to an address/location that contains a '2' \n");

    return EXIT_SUCCESS;
}

x is the street address of a house. *x examines the contents of that house.
Vanderdecken
  • 195
  • 1
  • 9
  • "It assigns the address of 0 to the char pointer x." --> Maybe. C does not specify the _value_ of the pointer, only that `char* x = 0; if (x == 0)` will be true. Pointers are not necessarily integers. – chux - Reinstate Monica Apr 11 '17 at 18:09
  • It doesn't 'point the pointer x to the memory address 0'. It sets the pointer value to an *unspecified* invalid value that can be *tested* by comparing it to 0, or NULL. The actual operation is implementation-defined. There is nothing here that answers the actual question. – user207421 Apr 12 '17 at 10:07
1

this is a null pointer

int * nullPtr = (void*) 0;