24

The standard C library functions strtof and strtod have the following signatures:

float strtof(const char *str, char **endptr);
double strtod(const char *str, char **endptr); 

They each decompose the input string, str, into three parts:

  1. An initial, possibly-empty, sequence of whitespace
  2. A "subject sequence" of characters that represent a floating-point value
  3. A "trailing sequence" of characters that are unrecognized (and which do not affect the conversion).

If endptr is not NULL, then *endptr is set to a pointer to the character immediately following the last character that was part of the conversion (in other words, the start of the trailing sequence).

I am wondering: why is endptr, then, a pointer to a non-const char pointer? Isn't *endptr a pointer into a const char string (the input string str)?

user5534993
  • 518
  • 2
  • 17
Daniel Trebbien
  • 38,421
  • 18
  • 121
  • 193
  • It's basically the same issue as `strchr` and friends, except that here we have an out-param pointer rather than a return value. – Steve Jessop Oct 06 '10 at 15:45
  • @Steve: yes, but it's more problematic than `strchr` because you can't pass a `const`-qualified pointer's address to these functions without an explicit cast. – R.. GitHub STOP HELPING ICE Oct 06 '10 at 15:48
  • 3
    Interesting question. Basically this means that you may hide a cast from `char const*` to `char*` behind `strtoX` functions. Weird. – Jens Gustedt Oct 06 '10 at 15:50
  • 2
    @Jens: It's annoying, but in C it's unavoidable since there's no function overloading. In C++, there are two `strchr` functions, returning `char*` if the input is `char*`, and `const char*` if the input is `const char*`. There's no such overloading of `strtod` in the C++ standard, though, and I don't know the rationale for that. And before anyone asks, there's no `strtof` at all in C++, since it's not in C89. A search reveals that I've been ignorant of this rationale for a long time: http://stackoverflow.com/questions/993700/are-strtol-strtod-unsafe – Steve Jessop Oct 06 '10 at 16:08

2 Answers2

11

The reason is simply usability. char * can automatically convert to const char *, but char ** cannot automatically convert to const char **, and the actual type of the pointer (whose address gets passed) used by the calling function is much more likely to be char * than const char *. The reason this automatic conversion is not possible is that there is a non-obvious way it can be used to remove the const qualification through several steps, where each step looks perfectly valid and correct in and of itself. Steve Jessop has provided an example in the comments:

if you could automatically convert char** to const char**, then you could do

char *p;
char **pp = &p;
const char** cp = pp;
*cp = (const char*) "hello";
*p = 'j';.

For const-safety, one of those lines must be illegal, and since the others are all perfectly normal operations, it has to be cp = pp;

A much better approach would have been to define these functions to take void * in place of char **. Both char ** and const char ** can automatically convert to void *. (The stricken text was actually a very bad idea; not only does it prevent any type checking, but C actually forbids objects of type char * and const char * to alias.) Alternatively, these functions could have taken a ptrdiff_t * or size_t * argument in which to store the offset of the end, rather than a pointer to it. This is often more useful anyway.

If you like the latter approach, feel free to write such a wrapper around the standard library functions and call your wrapper, so as to keep the rest of your code const-clean and cast-free.

Community
  • 1
  • 1
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 2
    "Both `char**` and `const char**` can automatically convert to void *." I see what you're saying, and using `void*` would allow the user to pass in a `const char**` without a cast. But it would also allow them to pass in a whole universe of wrong things without a cast. Given the limitations of C, I think it's better to preserve basic type-safety, even the cost of losing const-safety. – Steve Jessop Oct 06 '10 at 15:48
  • It is not simply usability, these functions store result in that pointer, and you cannot write to a constant memory. –  Oct 06 '10 at 15:49
  • @Vlad: You're confusing `const char **` with `char *const *`. – R.. GitHub STOP HELPING ICE Oct 06 '10 at 15:53
  • 2
    "If anyone has it, please post." - if you could automatically convert `char**` to `const char**`, then you could do `char *p; char **pp = &p; const char** cp = pp; *cp = (const char*) "hello"; *p = 'j';`. For const-safety, one of those lines must be illegal, and since the others are all perfectly normal operations, it has to be `cp = pp;`. – Steve Jessop Oct 06 '10 at 15:56
  • @Steve: That is an extremely helpful example for why `char**` cannot convert implicitly to `const char**`! – Daniel Trebbien Oct 06 '10 at 17:16
  • @R.: I like your idea to pass in a pointer to `size_t`. It seems more correct to me than the standard library's solution. – Daniel Trebbien Oct 06 '10 at 17:20
  • 1
    An offset certainly addresses the const problem, although it's too late now to consistently use that convention throughout the libraries, and it might already have been too disruptive by the time `const` was invented, I'm not sure. If it was normal for string-handling functions to return offsets rather than pointers, would `strcpy` always return 0? ;-) – Steve Jessop Oct 06 '10 at 18:49
  • 1
    Ideally `strcpy` would return the length of the string, a very useful piece of information it automatically obtains (and then throws away) as a side effect of its operation.. :-) – R.. GitHub STOP HELPING ICE Oct 06 '10 at 19:11
  • It also means you can't do this: `const char *end; strtof(str, &end);` On the other hand, because of this you *can* do this: `char *end; strtof(str, &end);` and **now you have a non-const pointer to potentially const memory!** – Matt Mar 12 '15 at 21:22
  • @SteveJessop Why the `(const char*)` cast? Also, `cp = pp` compiles with a warning on two of three compilers I tested on rextester? (Guess which one was MSVC). Even more interestingly, MSVC doesn't SIGSEGV when I run it in rextester... Is it catching the crash or something? – Alexander Riccio Aug 09 '17 at 18:20
  • @SteveJessop one more thing: – Alexander Riccio Aug 09 '17 at 18:47
  • @SteveJessop And one more thing: `const char** const endptr` seems to work fine as a second parameter? – Alexander Riccio Aug 09 '17 at 19:06
5

Usability. The str argument is marked as const because the input argument will not be modified. If endptr were const, then that would instruct the caller that he should not change data referenced from endptr on output, but often the caller wants to do just that. For example, I may want to null-terminate a string after getting the float out of it:

float StrToFAndTerminate(char *Text) {
    float Num;

    Num = strtof(Text, &Text);
    *Text = '\0';
    return Num;
}

Perfectly reasonable thing to want to do, in some circumstances. Doesn't work if endptr is of type const char **.

Ideally, endptr should be of const-ness matching the actual input const-ness of str, but C provides no way of indicating this through its syntax. (Anders Hejlsberg talks about this when describing why const was left out of C#.)

Aidan Cully
  • 5,457
  • 24
  • 27
  • The same effect could easily be achieved with `Text[FloatEnd-Text] = '\0'` so for me this isn't really a good excuse. The method that you are indicating will produce UB if `Text` would be declared with `const` and you couldn't easily read that from the location where you do the assignment. – Jens Gustedt Oct 06 '10 at 16:14
  • @Jens: I've updated the code to better reflect the rationale. And I agree about the undefined behavior problem. The question is whether the cure is worse than the disease... – Aidan Cully Oct 06 '10 at 16:46
  • "could easily be achieved" - but what's premature optimisation to us (preferring to the pointer directly, instead of relying on the compiler to sort out that expression, and/or the hardware to be so fast we don't care), wasn't premature optimisation to the C89 standards committee when they specified `strtod`. Also, that's a shameful idiom to have to learn, so we're just left wondering whether it's more or less shameful than const-incorrectness ;-) – Steve Jessop Oct 06 '10 at 16:46
  • @Steve: hm, premature optimization, in terms of arithmetic this is just `Text+FloatEnd-Text`, this is not really difficult to optimize by a compiler, no ?-) – Jens Gustedt Oct 06 '10 at 18:09
  • @Aidan: hope you took no offense, I believe that the motivation might have been as you are suggesting, but I would blame that on you ;-) – Jens Gustedt Oct 06 '10 at 18:10
  • @Jens: it's a different thing for you or me to say, "oh, it's a bit ugly, but I can live with that and I expect the compiler will optimize", than for the standards committee to say, "if people think it's ugly they can either live with it or cast the pointer to non-const once they have it back, and if their compiler doesn't optimize they can complain to the compiler-writer". We make our own decisions, but the standards committee makes decisions for (and is criticised by) everyone. There were probably still some pretty rudimentary compilers around in 1989, not much more than assemblers. – Steve Jessop Oct 06 '10 at 18:46
  • @Steve: sure, again I am a bit worried about your reaction. Reasons in 1989 were probably different from what they are today, and perhaps they had not been aware of the importance that the little keyword `const` would gain. The standard has been revised since, and the bogus interface stuck for `strtof`, well. Then, this interface has been transposed to the new interfaces in C99. It is just a pity, don't want to say more than that. – Jens Gustedt Oct 06 '10 at 20:22