48

I've been doing a fairly easy program of converting a string of Characters (assuming numbers are entered) to an Integer.

After I was done, I noticed some very peculiar "bugs" that I can't answer, mostly because of my limited knowledge of how the scanf(), gets() and fgets() functions work. (I did read a lot of literature though.)

So without writing too much text, here's the code of the program:

#include <stdio.h>

#define MAX 100

int CharToInt(const char *);

int main()
{
    char str[MAX];

    printf(" Enter some numbers (no spaces): ");
    gets(str);
//  fgets(str, sizeof(str), stdin);
//  scanf("%s", str);

    printf(" Entered number is: %d\n", CharToInt(str));

    return 0;
}

int CharToInt(const char *s)
{
    int i, result, temp;

    result = 0;
    i = 0;

    while(*(s+i) != '\0')
    {
        temp = *(s+i) & 15;
        result = (temp + result) * 10;
        i++;
    }

    return result / 10;
}

So here's the problem I've been having. First, when using gets() function, the program works perfectly.

Second, when using fgets(), the result is slightly wrong because apparently fgets() function reads newline (ASCII value 10) character last which screws up the result.

Third, when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value. For this, I have no explanation.

Now I know that gets() is discouraged to use, so I would like to know if I can use fgets() here so it doesn't read (or ignores) newline character. Also, what's the deal with the scanf() function in this program?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Marko
  • 491
  • 1
  • 5
  • 6
  • You may wan to replace your `CharToInt()` function with a call to `atoi()` (they do the same thing). Also, the `char` datatype is implicitly `signed`, which may explain the "-52 ASCII value" you were seeing. http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/ – sigint Jul 21 '10 at 18:07
  • Yes, I could use atoi(), but the very point of this program was to use Bitwise Operators. Also, thank you very much for reminding me about signed value of char. Using unsigned char solved the problem, even though I'm still unsure how and why. – Marko Jul 21 '10 at 18:18
  • 3
    @sigint: In C, char can be signed char or unsigned char at the discretion of the compiler. – ninjalj Jul 21 '10 at 18:26
  • 1
    I figured you probably had to write-your-own. As for why `unsigned char`(s) are solving your problem; A normal (`signed`) `char` has a value range of –128 to 127, whereas an `unsigned char` has a range of 0 to 255. The bit-twiddling was probably doing weird things with the negative values. – sigint Jul 21 '10 at 18:28
  • 5
    By the way, `*(s+i)` is usually written in C as `s[i]` (it has exactly the same semantics). – caf Jul 22 '10 at 00:12
  • `atoi` and this `CharToInt` both have the issue of causing undefined behaviour if the number you're trying to convert is bigger than `INT_MAX`. To fix this you can use a function from the `strtol` family, or modify `CharToInt` so that it breaks out instead of overflowing. (Actually `CharToInt` could do with further modification; as written it can only read up to `INT_MAX / 10`; and it does weird things if non-digits are entered) – M.M Sep 04 '14 at 03:47
  • You should test the input functions for success and only use the data if the function indicates success. If you get something peculiar in the input, the first step is to print what you got clearly: `printf("Input: [[%s]]\n", str);`. It is common for conversion functions to skip leading white space and stop at the first character that can't be part of a number. If there's only trailing white space (especially just a newline), you would not normally generate an error. You might or might not generate an error if there was some other non-numeric character after the converted string. – Jonathan Leffler Jul 10 '15 at 03:27
  • Related: [Why is the gets function so dangerous that it should not be used?](https://stackoverflow.com/questions/1694036/why-is-the-gets-function-so-dangerous-that-it-should-not-be-used) – RobertS supports Monica Cellio Mar 04 '20 at 17:23

7 Answers7

32
  • Never use gets. It offers no protections against a buffer overflow vulnerability (that is, you cannot tell it how big the buffer you pass to it is, so it cannot prevent a user from entering a line larger than the buffer and clobbering memory).

  • Avoid using scanf. If not used carefully, it can have the same buffer overflow problems as gets. Even ignoring that, it has other problems that make it hard to use correctly.

  • Generally you should use fgets instead, although it's sometimes inconvenient (you have to strip the newline, you must determine a buffer size ahead of time, and then you must figure out what to do with lines that are too long–do you keep the part you read and discard the excess, discard the whole thing, dynamically grow the buffer and try again, etc.). There are some non-standard functions available that do this dynamic allocation for you (e.g. getline on POSIX systems, Chuck Falconer's public domain ggets function). Note that ggets has gets-like semantics in that it strips a trailing newline for you.

jamesdlin
  • 81,374
  • 13
  • 159
  • 204
20

Yes, you want to avoid gets. fgets will always read the new-line if the buffer was big enough to hold it (which lets you know when the buffer was too small and there's more of the line waiting to be read). If you want something like fgets that won't read the new-line (losing that indication of a too-small buffer) you can use fscanf with a scan-set conversion like: "%N[^\n]", where the 'N' is replaced by the buffer size - 1.

One easy (if strange) way to remove the trailing new-line from a buffer after reading with fgets is: strtok(buffer, "\n"); This isn't how strtok is intended to be used, but I've used it this way more often than in the intended fashion (which I generally avoid).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
12

There are numerous problems with this code. We'll fix the badly named variables and functions and investigate the problems:

  • First, CharToInt() should be renamed to the proper StringToInt() since it operates on an string not a single character.

  • The function CharToInt() [sic.] is unsafe. It doesn't check if the user accidentally passes in a NULL pointer.

  • It doesn't validate input, or more correctly, skip invalid input. If the user enters in a non-digit the result will contain a bogus value. i.e. If you enter in N the code *(s+i) & 15 will produce 14 !?

  • Next, the nondescript temp in CharToInt() [sic.] should be called digit since that is what it really is.

  • Also, the kludge return result / 10; is just that -- a bad hack to work around a buggy implementation.

  • Likewise MAX is badly named since it may appear to conflict with the standard usage. i.e. #define MAX(X,y) ((x)>(y))?(x):(y)

  • The verbose *(s+i) is not as readable as simply *s. There is no need to use and clutter up the code with yet another temporary index i.

gets()

This is bad because it can overflow the input string buffer. For example, if the buffer size is 2, and you enter in 16 characters, you will overflow str.

scanf()

This is equally bad because it can overflow the input string buffer.

You mention "when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value."

That is due to an incorrect usage of scanf(). I was not able to duplicate this bug.

fgets()

This is safe because you can guarantee you never overflow the input string buffer by passing in the buffer size (which includes room for the NULL.)

getline()

A few people have suggested the C POSIX standard getline() as a replacement. Unfortunately this is not a practical portable solution as Microsoft does not implement a C version; only the standard C++ string template function as this SO #27755191 question answers. Microsoft's C++ getline() was available at least far back as Visual Studio 6 but since the OP is strictly asking about C and not C++ this isn't an option.

Misc.

Lastly, this implementation is buggy in that it doesn't detect integer overflow. If the user enters too large a number the number may become negative! i.e. 9876543210 will become -18815698?! Let's fix that too.

This is trivial to fix for an unsigned int. If the previous partial number is less then the current partial number then we have overflowed and we return the previous partial number.

For a signed int this is a little more work. In assembly we could inspect the carry-flag, but in C there is no standard built-in way to detect overflow with signed int math. Fortunately, since we are multiplying by a constant, * 10, we can easily detect this if we use an equivalent equation:

n = x*10 = x*8 + x*2

If x*8 overflows then logically x*10 will as well. For a 32-bit int overflow will happen when x*8 = 0x100000000 thus all we need to do is detect when x >= 0x20000000. Since we don't want to assume how many bits an int has we only need to test if the top 3 msb's (Most Significant Bits) are set.

Additionally, a second overflow test is needed. If the msb is set (sign bit) after the digit concatenation then we also know the number overflowed.

Code

Here is a fixed safe version along with code that you can play with to detect overflow in the unsafe versions. I've also included both a signed and unsigned versions via #define SIGNED 1

#include <stdio.h>
#include <ctype.h> // isdigit()

// 1 fgets
// 2 gets
// 3 scanf
#define INPUT 1

#define SIGNED 1

// re-implementation of atoi()
// Test Case: 2147483647 -- valid    32-bit
// Test Case: 2147483648 -- overflow 32-bit
int StringToInt( const char * s )
{
    int result = 0, prev, msb = (sizeof(int)*8)-1, overflow;

    if( !s )
        return result;

    while( *s )
    {
        if( isdigit( *s ) ) // Alt.: if ((*s >= '0') && (*s <= '9'))
        {
            prev     = result;
            overflow = result >> (msb-2); // test if top 3 MSBs will overflow on x*8
            result  *= 10;
            result  += *s++ & 0xF;// OPTIMIZATION: *s - '0'

            if( (result < prev) || overflow ) // check if would overflow
                return prev;
        }
        else
            break; // you decide SKIP or BREAK on invalid digits
    }

    return result;
}

// Test case: 4294967295 -- valid    32-bit
// Test case: 4294967296 -- overflow 32-bit
unsigned int StringToUnsignedInt( const char * s )
{
    unsigned int result = 0, prev;

    if( !s )
        return result;

    while( *s )
    {
        if( isdigit( *s ) ) // Alt.: if (*s >= '0' && *s <= '9')
        {
            prev    = result;
            result *= 10;
            result += *s++ & 0xF; // OPTIMIZATION: += (*s - '0')

            if( result < prev ) // check if would overflow
                return prev;
        }
        else
            break; // you decide SKIP or BREAK on invalid digits
    }

    return result;
}

int main()
{
    int  detect_buffer_overrun = 0;

    #define   BUFFER_SIZE 2    // set to small size to easily test overflow
    char str[ BUFFER_SIZE+1 ]; // C idiom is to reserve space for the NULL terminator

    printf(" Enter some numbers (no spaces): ");

#if   INPUT == 1
    fgets(str, sizeof(str), stdin);
#elif INPUT == 2
    gets(str); // can overflows
#elif INPUT == 3
    scanf("%s", str); // can also overflow
#endif

#if SIGNED
    printf(" Entered number is: %d\n", StringToInt(str));
#else
    printf(" Entered number is: %u\n", StringToUnsignedInt(str) );
#endif
    if( detect_buffer_overrun )
        printf( "Input buffer overflow!\n" );

    return 0;
}
Community
  • 1
  • 1
Michaelangel007
  • 2,798
  • 1
  • 25
  • 23
  • The `strlen()` function doesn't check whether you passed a null pointer. The standard C library specification explicitly says (§7.1.4 Use of library functions): _If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined._ It is reasonable to require a non-null pointer. – Jonathan Leffler Jul 10 '15 at 03:33
  • It is better to add a one-line safety check and catch careless mistakes then to assume a caller won't make them, but thanks for the Chapter Verse of the spec! – Michaelangel007 Jul 11 '15 at 01:14
5

You're correct that you should never use gets. If you want to use fgets, you can simply overwrite the newline.

char *result = fgets(str, sizeof(str), stdin);
char len = strlen(str);
if(result != NULL && str[len - 1] == '\n')
{
  str[len - 1] = '\0';
}
else
{
  // handle error
}

This does assume there are no embedded NULLs. Another option is POSIX getline:

char *line = NULL;
size_t len = 0;
ssize_t count = getline(&line, &len, stdin);
if(count >= 1 && line[count - 1] == '\n')
{
  line[count - 1] = '\0';
}
else
{
  // Handle error
}

The advantage to getline is it does allocation and reallocation for you, it handles possible embedded NULLs, and it returns the count so you don't have to waste time with strlen. Note that you can't use an array with getline. The pointer must be NULL or free-able.

I'm not sure what issue you're having with scanf.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
3

never use gets(), it can lead to unprdictable overflows. If your string array is of size 1000 and i enter 1001 characters, i can buffer overflow your program.

Peter Miehle
  • 5,984
  • 2
  • 38
  • 55
  • Thank you all for your answers. They were very much helpful. But I'd also like to know why scanf() doesn't work in this program? Thank you. – Marko Jul 21 '10 at 18:13
1

Try using fgets() with this modified version of your CharToInt():

int CharToInt(const char *s)
{
    int i, result, temp;

    result = 0;
    i = 0;

    while(*(s+i) != '\0')
    {
        if (isdigit(*(s+i)))
        {
            temp = *(s+i) & 15;
            result = (temp + result) * 10;
        }
        i++;
    }

    return result / 10;
}

It essentially validates the input digits and ignores anything else. This is very crude so modify it and salt to taste.

Amardeep AC9MF
  • 18,464
  • 5
  • 40
  • 50
  • To clean things up, consider replacing `strchr()` with `isdigit()`. Even better, replace the entire `CharToInt()` function with a call to `atoi()`. http://www.cplusplus.com/reference/clibrary/cctype/isdigit/ http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/ – sigint Jul 21 '10 at 18:14
-3

So I am not much of a programmer but let me try to answer your question about the scanf();. I think the scanf is pretty fine and use it for mostly everything without having any issues. But you have taken a not completely correct structure. It should be:

char str[MAX];
printf("Enter some text: ");
scanf("%s", &str);
fflush(stdin);

The "&" in front of the variable is important. It tells the program where (in which variable) to save the scanned value. the fflush(stdin); clears the buffer from the standard input (keyboard) so you're less likely to get a buffer overflow.

And the difference between gets/scanf and fgets is that gets(); and scanf(); only scan until the first space ' ' while fgets(); scans the whole input. (but be sure to clean the buffer afterwards so you wont get an overflow later on)

fhdrsdg
  • 10,297
  • 2
  • 41
  • 62
  • Omitting the & in front of str is perfectly fine since in C arrays are passed by pointer. That is, `scanf( "%s", str );` is exactly equivalent to `scanf( "%s", &str[0] );` – Michaelangel007 Jul 08 '15 at 20:10
  • This answer is wrong on several counts and dangerous. – siride Jul 08 '15 at 21:15
  • To be precise: (1) the `&` in front of `str` is not required and can generate warnings from an educated compiler; (2) you should test what `scanf()` returns to ensure you got the data you expected; (3) using `fflush(stdin)` is not supported by Standard C — it only works on some platforms, notably Microsoft; (4) `gets()` reads to the end of line (without any protection against overflows); (5) `fgets()` does not scan the whole input — it reads to the end of line or until there is no space left in the buffer; (6) `scanf()` can overflow the buffer — use `scanf("%99s", str)` if `MAX==100`. – Jonathan Leffler Jul 10 '15 at 03:12