1

The following code runs well under gcc 11.2.1:

// test.c
#include<stdio.h>
#include<stdlib.h>

int main(int argc, char **argv){
    char *msg;
    unsigned int val;

    msg = "1024 2048 4096 13";
    while(*msg != '\0'){
        val = strtoul(msg, &msg, 10);
        printf("%u\n",val);
    }
    return 0;
}

gcc -Wall -o test.bin test.c
$ ./test.bin
1024
2048
4096
13

Regarding the function strtoul,

unsigned long int strtoul(const char *str, char **endptr, int base)

(See notes on Update below.)

Is it correct to pass references to the same object in str and endptr? Or is it just a lucky coincidence that it did not explode in my face?

The reference manuals to stdlib do not mention that (for example).

Update:

  • In the comments and answer it is referenced the C standard. Here a link to its draft: ISO N2310.
  • In this document, the first argument of the strtoul function is called nptr as opposed to str which is the name used in this question. The answer and discussion below mostly uses nptr.
  • Section 6.7.3.1 contains the formal definition of the type qualifier restrict.
  • Section 7.22.1.3 contains the description of strtoul()
onlycparra
  • 607
  • 4
  • 22
  • The code is correct. There's no reference to the same object because `msg` is not any sort of reference to itself. – Paul Hankin Mar 12 '22 at 09:35
  • @PaulHankin how is it correct? The `restrict` (since C99: `unsigned long strtoul(const char *restrict str, char **restrict str_end, int base);`) qualifier is on the right, which applies to the `msg`, not a pointer to it. This code is UB, not only because modifying `char const*` is UB, but also because aliasing this at all is UB. –  Mar 12 '22 at 09:39
  • @Sahsahae I may by wrong, but isn't `restrict` to tell the compiler that nobody **outside** of the function is doing things on that pointer? If that is the case, then the code would be compliant. No? – onlycparra Mar 12 '22 at 09:51
  • 1
    @Sahsahae Here's what I see: `msg` and `&msg` don't alias each other. One is a pointer into a static string (in this program), and the other is the address of a variable. There is no modification of a `char const*`, and the restricts apply to the two pointers (and not what they point at). The code of `strtoul` parses the number from the start of `str`, and then does `*endptr = ` (if `endptr` isn't NULL). Rules about restrict aside (but I think there's no problem there either), this code has no obvious problems. – Paul Hankin Mar 12 '22 at 09:54
  • @PaulHankin Does this argument stand if, say, msg pointed to some allocated memory? It would just be a different address, but again, nobody is moving outside of the function. – onlycparra Mar 12 '22 at 10:01
  • I think you can't argue that there's UB or no UB based only on the type signatures. `restrict` would mean that in this program it would be UB if `strtoul` wrote through `**endptr` (because that would modify the object referred to by `str`, which is `restrict`-qualified). However, `strtoul` doesn't do that: it overwrites `*endptr`, which is not UB for any reason in this program. In fact `strtoul` can never read `*endptr` because it's valid (and normal) to pass in the address of an uninitialized pointer. – Paul Hankin Mar 12 '22 at 10:09
  • @Sahsahae My understanding of the disagreement is that you think that `restrict` means that it's impossible to get a pointer to an object referred to by a restrict-qualified pointer except via that pointer. My reading of the standard is that it's that all writes to the object referred to a restrict-qualified pointer go through that pointer. If that understanding is correct, that's why you think this is UB (because `**endptr` points to the same thing as `str`), and why I don't (because `strtoul` doesn't use `**endptr` at all). – Paul Hankin Mar 12 '22 at 10:14
  • @PaulHankin Just because a particular implementation doesn't use the pointer that is assumed to be unaliased by the very clear qualifier doesn't mean that it's not UB to alias them. And I have no clue what you're talking about, my implementation does use it, because it relies on guarantees written in the standard. And if you break them, you're the one who's wrong. I can argue about it and I will. Stop adding bugs to C programs. They have enough of them. Even if this wasn't UB, this is stupid, just provide NULL if you don't care about endptr which is an out parameter. –  Mar 12 '22 at 10:48
  • Maybe I'm not being clear enough, but it isn't really my fault if people don't read what `restrict` does, but in short: `&msg` points to `msg`, which is `char *restrict`, during the lifetime of this restricted pointer, ***no accesses to the object should be done through anywhere but this sole pointer***, what do you think `strtoul` does with that other pointer, `char const* msg` specifically? And yes, reading from that pointer is UB, because the restricted pointer is "alive" until it is replaced. –  Mar 12 '22 at 11:04
  • 1
    @Sahsahae: Re “what do you think `strtoul` does with that other pointer”: The first parameter of `strtoul`, `nptr`, receives the value of `msg` and uses that value to access the bytes in the string, but `strtoul` does not access the bytes in `msg` through `nptr`. The statement `strtoul(msg, &msg, 10);` is equivalent to `{ char *temporary = msg; strtoul(temporary, &msg, 10); }`. There is no violation of `restrict`. The fact `&msg` points to memory that contains the same address as `temporary` is irrelevant as the initial contents of that memory are not used by `strtoul` at all. – Eric Postpischil Mar 12 '22 at 11:42
  • @EricPostpischil `char *temporary` points to the object that is pointed to by a pointer with `*restrict` qualifier. Undefined behaviour to read or write to it. –  Mar 12 '22 at 13:32
  • @Sahsahae: What is pointed to by a pointer with `restrict` is irrelevant. `restrict` affects what the pointer points to, not what the pointer that the pointer points to points to. The formal definition of restrict in C 2018 6.7.3.1 says nothing about what `*endptr` points to. Further, the value in the pointer (the value of `*endptr`) is never used by `strtoul`; `strtoul` only writes to `*endptr`; it does not read it, so the value stored in it is irrelevant. – Eric Postpischil Mar 12 '22 at 13:35
  • @EricPostpischil Just so we're on the same page, you do know what `**restrict` means right? I'm telling you that because `*restrict endptr` points to the original string, ***reading it through any other pointer is UB***. I don't care how many temporary pointers to the original you make, neither does the C compiler. –  Mar 12 '22 at 13:37
  • @Sahsahae: `endptr` does not point to the original string. `*endptr` does. The rules for `restrict`, as applied to the parameter `char ** restrict endptr`, only concern what `endptr` points to. They do not concern what `*endptr` points to. – Eric Postpischil Mar 12 '22 at 13:39
  • @Sahsahae: In the formal definition of `restrict`, as relevant to this situation, `D` is `char ** restrict`, `P` is `endptr`, `T` is `char *`, `B` is the block associated with `strtoul`, `E` is `endptr` (or any value calculated from it, such as `endptr+1`, but `strtoul` has no reason to do that). Note particularly that `*endptr` is not based on `endptr`, per paragraph 3, as changing `endptr` to point to a copy of its stored value would not change `*endptr`), The only `L` of concern is the lvalue `*endptr`, and its `X` is the object `*endptr`… – Eric Postpischil Mar 12 '22 at 13:44
  • … Then paragraph 4 says `T` shall not be const-qualified. That is satisfied. It says every other lvalue used to access `X` shall have its address based on `P`. There is no other lvalue used to access `X`, so that is satisfied. It says if `P` is assigned the value of some expression based on another restricted pointer object, there are certain constraints. `P` is not assigned any value, so that is satisfied. All the requirements of the formal definition are satisfied, so there is no violation of the `restrict` requirements. – Eric Postpischil Mar 12 '22 at 13:58
  • @EricPostpischil Exactly, `*endptr` does. And `*endptr` lifetime is the same as that of strtoul, and thus reading from any other place is UB, that place being `char const* str`. If that's not how this works then C `restrict` keyword is fundamentally broken and should never be used, if it provides none of the guarantees that would make one want to use it in the first place, because any relevant compiler would see this memory dependency, even if you don't. `restrict` is impossible to use without mutual exclusivity, which is why you never alias pointers that are `restrict`ed. –  Mar 12 '22 at 13:59
  • @Sahsahae: `*endptr` is not `restrict` qualified, so it does not matter where it points. I have demonstrated how the formal rules of `restrict` are satisfied. If you think there is a violation, show us values of the variables in the formal definition, particularly `L` and `X`, that violate paragraph 4. – Eric Postpischil Mar 12 '22 at 14:23

1 Answers1

1

Other than the restrict qualifiers, neither the documentation for strtoul, in C 2018 7.22.1.4, nor the documentation for using the standard library generally, in 7.1.4, state any prohibition on *endptr pointing to the same memory that nptr does (where endptr and nptr are the second and first arguments of strtoul, respectively), nor do they state any relaxation of the specification under this condition (for example, there is no assertion that the behavior would be undefined if such a condition holds).

So we just need to consider the restrict qualifiers. strtoul is declared as unsigned long int strtoul(const char * restrict nptr, char ** restrict endptr, int base). Using the formal definition of restrict in 6.7.3.1 and considering the restrict on nptr first, 6.7.3.1 4 tells us that, if the memory pointed to by nptr or any is modified by any means, then every other lvalue used to access it shall be based on nptr. This condition is satisfied because strtoul will not modify the string that nptr points to.

Considering the restrict on endptr, we observe the memory it points to is indeed modified, as strtoul stores a value to that memory. Then 6.7.3.1 4 requires that every other lvalue used to access that memory shall be based on endptr. This condition is satisfied because strtoul does not access the bytes representing msg other than through endptr.

Therefore, the routine should function as specified in 7.22.1.4 even if it is called with strtoul(msg, &msg, 10).

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • One could think that the bytes of "the msg pointer" are read while `nptr` is dereferenced, and they are written to through `*endptr` at the end of strtoul. This sounds like a violation because the read of `nptr` is not based on `endptr`. From what I understand, the solution is that `nptr` is actually a different object than the `char *msg` pointer from the calling function (call by value). So _in the scope of the strtoul function_ there is no alias for `*endptr` (only for `**endptr`, `*nptr`, which is not written to) and hence there is no violation. Is this interpretation sound? – JayK Mar 12 '22 at 19:32
  • @JayK: Yes, `nptr` is a completely separate object from `msg`. `nptr` is the parameter of `strtoul`. Per C 2018 6.9.19 and 10 (executing a function), it is a separate object that is initialized with the value of the passed argument. The fact the value of the passed argument was taken from `msg` is irrelevant; when the function is called, there is no connection between `nptr` and `msg` other than that they have the same value. The value must have been evaluated from `msg` prior to the call, per C 2018 6.5.2.2 10 (there is a sequence point after argument evaluation and before the function call). – Eric Postpischil Mar 12 '22 at 21:33
  • Thanks for the enlightening discussion. I will make some updates and add some references to the original question in order to make it more useful for future visitors. – onlycparra Mar 13 '22 at 02:06