4

I want to convert a number, given in string format to an integer. I've found multiple solutions for this, like atoi() or strtol(). However, if not given an Input, that could be converted to an integer, like strtol("junk", &endptr, 10) I just get back the integer 0. This conflicts with cases, where I actually have the number "0" as my input.

Is there a function, that can handle this edgecase by returning a pointer to an integer, instead of an integer, so that in a case like above, I'd just get NULL?

dedpunkrz
  • 69
  • 4
  • 3
    Use `strtol`, and check its return values — both of them — carefully. It gives you back a pointer to the remainder of the string, that it didn't parse. If that's equal to the pointer you gave it, it didn't parse anything. If that points to anything other than whitespace (or `'\0'`), there was trailing nonnumeric garbage. You can also check `errno` to distinguish overflow cases. – Steve Summit May 26 '23 at 17:22
  • I believe the integer data type cannot have NULL value. – JustaNobody May 26 '23 at 17:22
  • 1
    `strtol` gives you the end pointer, which you can use to check for errors. If the end pointer is the same as the start pointer, then there was no number. If the end pointer is not the end of the string, then the number was followed by non-numeric characters. – Tom Karzes May 26 '23 at 17:22
  • 1
    See also [Correct usage of `strtol`](https://stackoverflow.com/questions/14176123/correct-usage-of-strtol). – Steve Summit May 26 '23 at 17:25

3 Answers3

3

If a conversion is successfully done, (eg, if value "0" is parsed), then the second parameter to strtol (endptr) will end up greater than the first one.

If a conversion could not be done, then the parameter endptr will be unchanged.

Demonstrated in this program:

#include <stdio.h>

int main(void) {
    char* text = "junk";
    char* endptr = NULL;
    
    int answer = strtol(text, &endptr, 10);
    
    if ( endptr > text )
    {
        printf("Number was converted: %d\n", answer);
    }
    else
    {
        printf("No Number could be found\n");
    }

    return 0;
}
abelenky
  • 63,815
  • 23
  • 109
  • 159
  • 1
    The function `strtol` requires `#include `. – Andreas Wenzel May 26 '23 at 22:27
  • 2
    If you change the string from `"junk"` to `"6junk"`, then your program will report that the conversion was successful, even though only the first character was converted. Depending on the situation, this may be undesirable. See my answer for a solution which ensures that the entire string was converted. – Andreas Wenzel May 26 '23 at 22:29
  • [I'm not so sure the posted code is correct](https://port70.net/~nsz/c/c11/n1570.html#7.22.1.4p8): "If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno." The assignment of the end pointer occurs in paragraph 5. The exact point in the sequence where the conversion occurs doesn't appear to be clearly specified, thus a string for a `long` greater than `LONG_MAX` might still set the end ptr – Andrew Henle May 27 '23 at 01:18
  • (cont) You might also have to set `errno` to zero prior to the call and check its value along with the returned value to be sure a legitimate, converted value was returned. – Andrew Henle May 27 '23 at 01:19
  • 1
    @AndrewHenle: there is no ambiguity in ¶5: *If the subject sequence has the expected form and the value of `base` is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type). A pointer to the final string is stored in the object pointed to by `endptr`, provided that `endptr` is not a null pointer.* the end ptr must set past the last digit in the subject sequence in all cases, including for out of range values (¶8). – chqrlie May 27 '23 at 08:12
  • @chqrlie *there is no ambiguity* Oh? Then you should be able to quote the text from the standard that specifies *exactly* when in that entire sequence the actual conversion is performed. Note also that [p7](https://port70.net/~nsz/c/c11/n1570.html#7.22.1.4p7) does not specify that if the result is out-of-range that no conversion is performed. You'd also need to explain how an out-of-range conversion that returned an error while also setting the end pointer to the first non-digit character past the string that was out of range violates the standard. – Andrew Henle May 27 '23 at 12:14
  • 1
    @AndrewHenle: It does not matter *when* the conversion is attempted, ¶5 has 3 sentences, each one is a statement that explains what must happen, the first 2 start with a condition, not the third: *A pointer to the final string is stored in the object pointed to by `endptr`, provided that `endptr` is not a null pointer*. *final string* is defined in ¶2 as the remainder of the string after the subject sequence. ¶7 covers the cases where *the subject sequence is empty or does not have the expected form*, so all cases are covered. Out of range conversions only affect the return value and `errno`. – chqrlie May 27 '23 at 17:48
  • 1
    @chqrlie *Out of range conversions only affect the return value and `errno`* That was my entire point - the code in this question doesn't check either of those and thus can't handle an out-of-range conversion. – Andrew Henle May 27 '23 at 22:09
2

The function strtol will return 0 if

  • the string was successfully converted to 0, or
  • the conversion failed.

In order to distinguish these two cases, you will have to inspect the value of endptr, which points to the first character that was not converted.

If endptr points to the start of the string specified in the first argument of strtol, then no characters were successfully converted, which means that the conversion failed.

If endptr does not point to the start of the string, then at least one character was successfully converted. Therefore, one could consider the conversion to have been successful.

On the other hand, if you call

strtol( "6junk", &endptr, 10 );

then the function will return 6 and endptr will point to the character j, which is the first character that was not converted. Depending on the situation, you may want to consider the conversion successful, or you may want to consider the conversion a failure, because not the entire string was converted. For example, if you ask the user to enter an integer and the user enters 6junk, then you will probably want to reject the input as invalid, even if the first character was successfully converted.

For this reason, you may want to test whether endptr points to the end of the string (i.e. to the terminating null character), in order to determine whether the entire string was successfully converted. Here is an example:

#include <stdlib.h>
#include <stdbool.h>

//This function will return true if the entire string was
//successfully converted to a long integer, otherwise it will
//return false.
bool convert_string_to_long( char str[], long *num )
{
    long result;
    char *endptr;

    //attempt to convert string to number
    result = strtol( str, &endptr, 10 );
    if ( endptr == str )
    {
        return false;
    }

    //test whether the entire string was converted
    if ( *endptr != '\0' )
    {
        return false;
    }

    //everything went ok, so pass the result
    *num = result;
    return true;
}

However, the code above is inconsistent in that it will accept leading whitespace characters, but reject trailing whitespace characters. Therefore, instead of testing whether endptr points to the terminating null character, it would probably be better to inspect the remaining characters, and to only reject the string if at least one remaining character is not a whitespace character, like this:

#include <stdlib.h>
#include <ctype.h>
#include <stdbool.h>

//This function will return true if the entire string was
//successfully converted to a long integer, otherwise it will
//return false.
bool convert_string_to_long( char str[], long *num )
{
    long result;
    char *endptr;

    //attempt to convert string to number
    result = strtol( str, &endptr, 10 );
    if ( endptr == str )
    {
        return false;
    }

    //verify that there are no unconverted characters, or that if
    //such characters do exist, that they are all whitespace
    //characters
    for ( ; *endptr != '\0'; endptr++ )
    {
        if ( !isspace( (unsigned char)*endptr ) )
        {
            return false;
        }
    }

    //everything went ok, so pass the result
    *num = result;
    return true;
}

Another issue is that it is possible that the user enters a value that is outside the range of representable values of a long int (i.e. too high or too low). In that case, the function strtol will set errno to ERANGE. We can detect whether strtol set errno to ERANGE by setting errno to 0 before calling strtol, and after the function call, we check whether errno has been changed to ERANGE. Here is an example:

#include <stdlib.h>
#include <stdbool.h>
#include <errno.h>

//This function will return true if the entire string was
//successfully converted to a long integer, otherwise it will
//return false.
bool convert_string_to_long( char str[], long *num )
{
    long result;
    char *endptr;

    //attempt to convert string to number
    errno = 0;
    result = strtol( str, &endptr, 10 );
    if ( endptr == str )
    {
        return false;
    }

    //verify that no range error occurred
    if ( errno == ERANGE )
    {
        return false;
    }

    //verify that there are no unconverted characters, or that if
    //such characters do exist, that they are all whitespace
    //characters
    for ( ; *endptr != '\0'; endptr++ )
    {
        if ( !isspace( (unsigned char)*endptr ) )
        {
            return false;
        }
    }

    //everything went ok, so pass the result
    *num = result;
    return true;
}

Here is a complete working example program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <errno.h>

//forward declarations
bool convert_string_to_long( char str[], long *num );
void get_line_from_user( const char prompt[], char buffer[], int buffer_size );

int main( void )
{
    //repeat forever
    for (;;)
    {
        char line[200];
        long num;

        //get input from user
        get_line_from_user(
            "Please enter an integer: ",
            line, sizeof line
        );

        //attempt to convert input to a string
        if ( convert_string_to_long( line, &num ) )
        {
            printf( "Input was successfully converted to: %ld\n", num );
        }
        else
        {
            printf( "Input was invalid!\n" );
        }
    }
}

//This function will return true if the entire string was
//successfully converted to a long integer, otherwise it will
//return false.
bool convert_string_to_long( char str[], long *num )
{
    long result;
    char *endptr;

    //attempt to convert string to number
    errno = 0;
    result = strtol( str, &endptr, 10 );
    if ( endptr == str )
    {
        return false;
    }

    //verify that no range error occurred
    if ( errno == ERANGE )
    {
        return false;
    }

    //verify that there are no unconverted characters, or that if
    //such characters do exist, that they are all whitespace
    //characters
    for ( ; *endptr != '\0'; endptr++ )
    {
        if ( !isspace( (unsigned char)*endptr ) )
        {
            return false;
        }
    }

    //everything went ok, so pass the result
    *num = result;
    return true;
}

//This function will read exactly one line of input from the
//user. It will remove the newline character, if it exists. If
//the line is too long to fit in the buffer, then the function
//will automatically reprompt the user for input. On failure,
//the function will never return, but will print an error
//message and call "exit" instead.
void get_line_from_user( const char prompt[], char buffer[], int buffer_size )
{
    for (;;) //infinite loop, equivalent to while(1)
    {
        char *p;

        //prompt user for input
        fputs( prompt, stdout );

        //attempt to read one line of input
        if ( fgets( buffer, buffer_size, stdin ) == NULL )
        {
            printf( "Error reading from input!\n" );
            exit( EXIT_FAILURE );
        }

        //attempt to find newline character
        p = strchr( buffer, '\n' );

        //make sure that entire line was read in (i.e. that
        //the buffer was not too small to store the entire line)
        if ( p == NULL )
        {
            int c;

            //a missing newline character is ok if the next
            //character is a newline character or if we have
            //reached end-of-file (for example if the input is
            //being piped from a file or if the user enters
            //end-of-file in the terminal itself)
            if ( (c=getchar()) != '\n' && !feof(stdin) )
            {
                if ( ferror(stdin) )
                {
                    printf( "Error reading from input!\n" );
                    exit( EXIT_FAILURE );
                }

                printf( "Input was too long to fit in buffer!\n" );

                //discard remainder of line
                do
                {
                    c = getchar();

                    if ( ferror(stdin) )
                    {
                        printf( "Error reading from input!\n" );
                        exit( EXIT_FAILURE );
                    }

                } while ( c != '\n' && c != EOF );

                //reprompt user for input by restarting loop
                continue;
            }
        }
        else
        {
            //remove newline character by overwriting it with
            //null character
            *p = '\0';
        }

        //input was ok, so break out of loop
        break;
    }
}

This program has the following behavior:

Please enter an integer: junk
Input was invalid!
Please enter an integer: 6junk
Input was invalid!
Please enter an integer: 60000000000000000000
Input was invalid!
Please enter an integer: 6
Input was successfully converted to: 6

As you can see, the number 60000000000000000000, which is too large to be representable as a long int on most platforms, was correctly rejected.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
0
int main(){
    char x[] = "0";
    int y;
    int t=sscanf(x,"%d",&y);
    if(t==1){
        printf("Converted");
    }
    else
    printf("Not converted");
}

Here the sscanf if converted would return 1. Other values will be not converted.

JustaNobody
  • 150
  • 8
  • 1
    You'd be better testing the only "passed" value `1` since your code won't detect when `sscanf` returns `EOF` or `-1`. So `if(t==1) printf("Converted"); else printf("Not converted");` – Weather Vane May 26 '23 at 17:28
  • K will make the update – JustaNobody May 26 '23 at 17:28
  • *Other values will be not converted* [It's worse than that](https://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10): "... if the result of the conversion cannot be represented in the object, the behavior is undefined." Feeding the string `"99999999999999999999999999999999999999"` into your code invokes undefined behavior for every C implementation I'm aware of. – Andrew Henle May 27 '23 at 01:08
  • @AndrewHenle: not really: what you are suggesting is undefined behavior in the general case where the input string may contain overlong number representations, but in this specific case, there is no undefined behavior on any conforming C implementation. – chqrlie May 27 '23 at 07:48
  • The *undefined behavior* in question allows implementations to store various values into the destination object, either clamped to the destination object range or the result modulo its bitwidth. Anything else is unexpected but breaking into the debugger would be welcome when tracking software problems. Aborting the program is also possible albeit rarely implemented. – chqrlie May 27 '23 at 08:04
  • @chqrlie The text of the standard is pretty clear: "if the result of the conversion cannot be represented in the object, the behavior is undefined" *The undefined behavior in question allows implementations to store various values into the destination object ...* So there's no way to safely use any of the `scanf()` functions as there's no way to be sure your results accurately represent the input. *Anything else is unexpected* When has that ever gotten in the way of GCC developers? – Andrew Henle May 27 '23 at 12:07
  • 1
    Yeah `strtol` or `strtoul` is generally recommended in the forums I checked due to these reasons – JustaNobody May 27 '23 at 17:26