2
#include <ctype.h>
#include <stdio.h>

int atoi(char *s);

int main()
{
    printf("%d\n", atoi("123"));
}



int atoi(char *s)
{
    int i;

    while (isspace(*s))
        s++;

    int sign = (*s == '-') ? -1 : 1;

    /* same mistake for passing pointer to isdigit, but will not cause CORE DUMP */ 
    // isdigit(s), s++;// this will not lead to core dump
    // return -1;
    /* */

    /* I know s is a pointer, but I don't quite understand why code above will not and code below will */
    if (!isdigit(s))
        s++;
    return -1;
    /* code here will cause CORE DUMP instead of an comile-time error */

    for (i = 0; *s && isdigit(s); s++)
        i = i * 10 + (*s - '0');

    return i * sign;
}

I got "Segmentation fault (core dumped)" when I accidentally made mistake about missing * operator before 's' then I got this confusing error. Why "(!isdigit(s))" lead to core dump while "isdigit(s), s++;" will not.

Lyon Ma
  • 21
  • 2
  • I think you should include a minimal `main` here that demonstrates it. A so called [mre]. I cannot see why this would segfault unless you pass a string without termination. – klutt Sep 27 '21 at 07:12
  • So the question is why a syntax error does or does not cause a core dump? Just correct the syntax error (and any other warning). – Weather Vane Sep 27 '21 at 07:21
  • @WeatherVane Since it compiles it cannot be a syntax error – klutt Sep 27 '21 at 07:22
  • @klutt: would you prefer me to say: the syntax is incorrect? – Weather Vane Sep 27 '21 at 07:23
  • 3
    @WeatherVane It's a logical error. Not a syntax error. Using a pointer address as an integer is allowed afik, even if it does not make sense in most cases. – klutt Sep 27 '21 at 07:25
  • @klutt thanks for your advice, I'll post it – Lyon Ma Sep 27 '21 at 07:25
  • @LyonMa Make something simple like `int main(void) { char *s = "1234567"; atoi(s); }` – klutt Sep 27 '21 at 07:28
  • Just updated the question as the advice proposed by @klutt new here so if I didn't make it clear enough plz kindly let me know, thanks – Lyon Ma Sep 27 '21 at 07:34
  • @klutt You should use [`intptr_t`](https://stackoverflow.com/questions/6326338/why-when-to-use-intptr-t-for-type-casting-in-c) (or `uintptr_t`) in that case. – Bob__ Sep 27 '21 at 07:47
  • The root cause is a mis-configured compiler since `(!isdigit(s))` shouldn't compile cleanly. See [What compiler options are recommended for beginners learning C?](https://software.codidact.com/posts/282565) – Lundin Sep 27 '21 at 08:17

5 Answers5

2

From isdigit [emphasis added]

The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.

From isdigit [emphasis added]

The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined.

https://godbolt.org/z/PEnc8cW6T


An undefined behaviour includes it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

H.S.
  • 11,654
  • 2
  • 15
  • 32
  • This isn't why the code has UB. Implicit conversions from `char*` to `int` are not valid C. Parameter passing uses the rules of assignment and the code contains a constraint violation there. The compiler must issue a diagnostic message. – Lundin Sep 27 '21 at 08:20
  • @Lundin I agree but one point, quoting from your [post](https://stackoverflow.com/questions/52186834/pointer-from-integer-integer-from-pointer-without-a-cast-issues) - "_Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type_". Isn't the undefined behaviour conditional here - "_**If the result cannot be represented in the integer type**, the behavior is undefined._"? – H.S. Sep 27 '21 at 12:02
  • That refers to scenarios such as the integer value being too large, or of the wrong signedness. So in case you have a 64 bit pointer and a 32 bit int, then it cannot be represented. But all of this is rather irrelevant, since the conversion only happens in case the expression is a valid form of assignment, which it is not in this case. – Lundin Sep 27 '21 at 12:27
  • @Lundin I got your point. I will edit my post and include it. But now this has raised another question - Can a constraint violation lead to segmentation fault or UB? I have found one discussion [here](https://stackoverflow.com/questions/41149194/is-running-a-binary-generated-from-a-code-with-constraint-violation-actually-u) but not very convincing. Assume that, in an environment, both pointer and integer is of same size. Could this `char *s; int param_to_isdigit = s;` lead to UB/segmentation fault? (assume, result can be represented in the integer type). – H.S. Sep 27 '21 at 17:46
  • A compiler deciding to generate an executable even in the presence of constraint or syntax violations has left the C language and produced a something-else-language executable. So it's definitely UB as it's beyond the scope of the language. – Lundin Sep 28 '21 at 06:32
1

You are invoking undefined behavior. isdigit() is supposed to receive an int argument, but you pass in a pointer. This is effectively attempting to assign a pointer to an int (xref: Language / Expressions / Assignment operators / Simple assignment, ¶1).

Furthermore, there is a constraint that the argument to isdigit() be representable as an unsigned char or equal to EOF. (xref: Library / Character handling <ctype.h>, ¶1).

As a guess, the isdigit() function may be performing some kind of table lookup, and the input value may cause the function to access a pointer value beyond the table.

jxh
  • 69,070
  • 8
  • 110
  • 193
  • Can you cite the standard where it says that it's UB to implicitly convert a pointer to `int`? – klutt Sep 27 '21 at 07:27
  • C11, Section 6.3.2.3, Paragraph 6. – jxh Sep 27 '21 at 07:34
  • // isdigit(s), s++;// but why this will not ? – Lyon Ma Sep 27 '21 at 07:43
  • the compiler will just warn about this https://godbolt.org/z/e4zWbfqoG – phuclv Sep 27 '21 at 07:46
  • @LyonMa Because undefined behavior is a range of behaviors, which may include not crashing. – jxh Sep 27 '21 at 07:46
  • It's the wrong source. 6.3.2.3 is about the actual conversion and it _doesn't_ say that converting from pointer to integer is UB, but rather implementation-defined (unless misaligned or trap, then it is indeed UB). Specifically: "Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type." The correct standard source to quote is the rules of simple assignment. – Lundin Sep 27 '21 at 08:22
  • @Lundin You are correct, the assignment constraint is most applicable here. – jxh Sep 27 '21 at 17:54
1

All answers so far has failed to point out the actual problem, which is that implicit pointer to integer conversions are not allowed during assignment in C. Details here: "Pointer from integer/integer from pointer without a cast" issues

Specifically C17 6.5.2.2/7

If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters

Where "as if by assignment" sends us to check the rules of assignment 6.5.16.1, which are quoted in the above link. So isdigit(s) is equivalent to something like this:

char* s;
...
int param_to_isdigit = s; // constraint violation of 6.5.16.1

Here the compiler must issue a diagnostic message. If you didn't spot it or in case you are using a tool chain giving warnings instead of errors, check out What compiler options are recommended for beginners learning C? so that you prevent code like this from compiling, so that you don't have to spend time troubleshooting bugs that the compiler already spotted for you.


Furthermore, the ctype.h functions require that the passed integer must be representable as unsigned char, but that's another story. C17 7.4 Character handling <ctype.h>:

In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF

Lundin
  • 195,001
  • 40
  • 254
  • 396
0

Why no segfault from isdigit(s), s++;?

First of all. Undefined behavior can manifest itself in a lot of ways, including the program working as intended. That's what undefined means.

But that line is not equivalent to your if statement. What this does is that it executes isdigit(s), throws away the result, increments s and also throw away the result of that operation.

However, isdigit does not have side effects, so it's quite probable that the compiler simply removes the call to that function, and replace this line with an unconditional s++. That would explain why it does not segfault. But you would have to study the generated assembly to make sure, but it's a possibility.

You can read about the comma operator here What does the comma operator , do?

klutt
  • 30,332
  • 17
  • 55
  • 95
0

I wasn't able to repeat this behaviour in MacOS/Darwin, but I was able to in Debian Linux.

To investigate a bit further, I wrote the following program:

#include <ctype.h>
#include <stdio.h>

int main()
{
    printf("isalnum('a'):  %d\n", isalnum('a'));
    printf("isalpha('a'):  %d\n", isalpha('a'));
    printf("iscntrl('\n'): %d\n", iscntrl('\n'));
    printf("isdigit('1'):  %d\n", isdigit('1'));
    printf("isgraph('a'):  %d\n", isgraph('a'));
    printf("islower('a'):  %d\n", islower('a'));
    printf("isprint('a'):  %d\n", isprint('a'));
    printf("ispunct('.'):  %d\n", ispunct('.'));
    printf("isspace(' '):  %d\n", isspace(' '));
    printf("isupper('A'):  %d\n", isupper('A'));
    printf("isxdigit('a'): %d\n", isxdigit('a'));
    printf("isdigit(0x7fffffff): %d\n", isdigit(0x7fffffff));
    return 0;
}

In MacOS, this just prints out 1 for every result except the last one, implying that these functions are simply returning the result of a logical comparison.

The results are a bit different in Linux:

isalnum('a'):  8
isalpha('a'):  1024
iscntrl('\n'): 2
isdigit('1'):  2048
isgraph('a'):  32768
islower('a'):  512
isprint('a'):  16384
ispunct('.'):  4
isspace(' '):  8192
isupper('A'):  256
isxdigit('a'): 4096
Segmentation fault

This suggests to me that the library used in Linux is fetching values from a lookup table and masking them with a bit pattern corresponding to the argument provided. For example, '1' (ASCII 49) is an alphanumeric character, a digit, a printable character and a hex digit, so entry 49 in this lookup table is probably equal to 8+2018+32768+16384+4096, which is 55274.

The documentation for these functions does mention that the argument must have either the value of an unsigned char (0-255) or EOF (-1), so any value outside this range is causing this table to be read out of bounds, resulting in a segmentation error.

Since I'm only calling the isdigit() function with an integer argument, this can hardly be described as undefined behaviour. I really think the library functions should be hardened against this sort of problem.

r3mainer
  • 23,981
  • 3
  • 51
  • 88
  • `isdigit(0x7fffffff)` is UB because of 7.4, which is what the man page you linked to corresponds with. `0x7fffffff` cannot be expressed as unsigned char and the value of EOF is usually -1, which in raw binary 2's complement is `0xffffffff`. – Lundin Sep 27 '21 at 08:46
  • @Lundin My understanding was that UB is defined in the C standard. Providing an integer argument to a function that requires an integer argument doesn't violate anything in the C standard. In my view, the problem here is sloppy implementation of library functions that could *quite easily* have been coded [without causing this sort of issue](https://searchcode.com/codesearch/view/10200599/). – r3mainer Sep 27 '21 at 09:25
  • 1
    It is UB since it violates the part in 7.4 that I quoted in my own answer. Indeed good implementations include a cast to unsigned char internally, but they aren't required to. – Lundin Sep 27 '21 at 09:29
  • OK, I just tracked down your reference to 7.4 and you were correct. In that case I think the C standard is at fault here. This is an issue that could have been fixed easily, but they chose not to :( – r3mainer Sep 27 '21 at 09:31
  • It's a design flaw in the ctype library since the dawn of time. At some point where dinosaurs walked the earth, C only had one type and that was int. Then over time and upon standardization it got more types, but they didn't change the API of some functions because that could have broken existing dinosaur code. The same reason as why `getchar` unintuitively doesn't return a `char` but an `int`. One of the biggest design flaws in C was that they included existing Unix slop as the standard library, instead of writing a completely new library - preferably designed by professionals. – Lundin Sep 27 '21 at 09:34
  • But surely even prehistoric code isn't going to rely on `isdigit()` et al. behaving in this way. What useful purpose is served by segfaulting when the integer argument is outside the prescribed range? – r3mainer Sep 27 '21 at 09:37
  • Likely performance. Compiler/library creators don't think like application programmers. Everything is performance to them and usability of their tools in the real world is low priority. You might be able to shave a few CPU ticks somewhere by using an aligned int instead of a single byte. Though I could swear that I saw a cast to `unsigned char` inside some ctype.h function in one of the glibc flavours, so I'm a bit surprised that this crashes in Linux. – Lundin Sep 27 '21 at 09:41
  • 1
    I took a peek at https://github.com/lattera/glibc/blob/master/ctype/ctype.h and there's an interesting comment about supported values at line 63. The code is calling various look-up tables left and right, I don't quite follow it, but likely the seg fault comes from one of those table checks. – Lundin Sep 27 '21 at 09:58