2

Hi I am very new to C and I am having an issue where I am confused on how to parse a string line to a integer. The way I have it so far is just to parse the first string into integer. so If my input is 10 20 30 it will only take the first string and parse it to integer. I am looking for a idea on how to come up with a solution that can read all of the line and parse it all to integer values using getline().

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    char *line = NULL;
    size_t len = 0; 
    int val =0;
    int sum = 0;

    while (getline(&line, &len, stdin) != EOF) {
    
        printf("Line input : %s\n", line);
        //printf("Test %d", val);

        //parse char into integer 
        val = atoi(line);

        printf("Parsed integer: %d\n", val);
    }
    free(line); 
    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
Laith
  • 81
  • 5
  • 2
    ⟼Remember, it's always important, *especially* when learning and asking questions on Stack Overflow, to keep your code as organized as possible. [Consistent indentation](https://en.wikipedia.org/wiki/Indentation_style) helps communicate structure and, importantly, intent, which helps us navigate quickly to the root of the problem without spending a lot of time trying to decode what's going on. – tadman Oct 01 '20 at 21:50
  • Also, in my experience anyway, `fgets` is used instead of `getline` in C code. – tadman Oct 01 '20 at 21:53
  • 3
    @tadman — No; if `getline()` is passed a NULL pointer, it can be handed off to `free()` (it's a no-op) and it is perfectly safe. The code also sets `len` to zero. The POSIX specification for [`getline()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html) is quite clear that this code is kosher. – Jonathan Leffler Oct 01 '20 at 21:55
  • 1
    @tadman: Here `getline` is being passed a *pointer* to a NULL pointer, and in this case it does indeed allocate memory for you. Again, explained in Jonathan's link. – Nate Eldredge Oct 01 '20 at 21:56
  • Use `strtol()` or parse the numbers, with a non-null pointer for the second argument to tell you where the parse stopped (and therefore where the next one should start). If you're expecting more than one number on a line, wrap the parsing code in a loop. – Jonathan Leffler Oct 01 '20 at 21:56
  • 1
    @tadman: I think perhaps you are thinking of a different `getline` function than the rest of us. – Nate Eldredge Oct 01 '20 at 21:57
  • @NateEldredge Maybe confusing it with `std::getline`. Thanks for pointing that out. – tadman Oct 01 '20 at 22:08
  • See also [Correct usage of `strtol()`](https://stackoverflow.com/q/14176123/15168) and [Using `sscanf()` in a loop](https://stackoverflow.com/q/3975236/15168). – Jonathan Leffler Oct 01 '20 at 22:11
  • @JonathanLeffler this is kinda stupid question but how can I loop the parse wouldn't that just loop the parse it self? like if I input 10 20 30 it will only loop the first input which is 10? – Laith Oct 01 '20 at 22:16
  • `char *start = line; char *eon; long value; errno = 0; value = strtol(start, &eon, 0); while (eon != start && !((value == 0 && errno == EINVAL) || (value == LONG_MIN && errno == EINVAL) || (value == LONG_MAX && errno == EINVAL))) { printf("%ld\n", value); start = eon; value = strtol(start, &eon, 0); }` — where you should be able to get rid of the duplicate call to `strtol()` but I'm currently drawing a blank. – Jonathan Leffler Oct 01 '20 at 22:23
  • 2
    Note that `getline()` is explicitly defined to return `-1` on EOF, not `EOF`. Usually, `EOF == -1`, but that is not guaranteed. It's worth testing for the documented return value. – Jonathan Leffler Oct 01 '20 at 22:41
  • @JonathanLeffler ill give it a try – Laith Oct 01 '20 at 22:54

3 Answers3

2

As I noted in comments, it is probably best to use strtol() (or one of the other members of the strtoX() family of functions) to convert the string to integers. Here is code that pays attention to the Correct usage of strtol().

#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char *line = NULL;
    size_t len = 0;

    while (getline(&line, &len, stdin) != -1)
    {
        printf("Line input : [%s]\n", line);
        int val = atoi(line);
        printf("Parsed integer: %d\n", val);

        char *start = line;
        char *eon;
        long value;
        errno = 0;
        while ((value = strtol(start, &eon, 0)),
               eon != start &&
               !((errno == EINVAL && value == 0) ||
                 (errno == ERANGE && (value == LONG_MIN || value == LONG_MAX))))
        {
            printf("%ld\n", value);
            start = eon;
            errno = 0;
        }
        putchar('\n');
    }
    free(line);
    return 0;
}

The code in the question to read lines using POSIX getline() is almost correct; it is legitimate to pass a pointer to a null pointer to the function, and to pass a pointer to 0. However, technically, getline() returns -1 rather than EOF, though there are very few (if any) systems where there is a difference. Nevertheless, standard C allows EOF to be any negative value — it is not required to be -1.

For the extreme nitpickers, although the Linux and macOS man pages for strtol() state "returns 0 and sets errno to EINVAL" when it fails to convert the string, the C standard doesn't require errno is set for that. However, when the conversion fails, eon will be set to start — that is guaranteed by the standard. So, there is room to argue that the part of the test for EINVAL is superfluous.

The while loop uses a comma operator to call strtol() for its side-effects (assigning to value and eon), and ignores the result — and ignoring it is necessary because all possible return values are valid. The other three lines of the condition (the RHS of the comma operator) evaluate whether the conversion was successful. This avoids writing the call to strtol() twice. It's possibly an extreme case of DRY (don't repeat yourself) programming.

Small sample of running the code (program name rn89):

$ rn89
   1  2    4  5       5  6
Line input : [   1  2    4  5       5  6
]
Parsed integer: 1
1
2
4
5
5
6

232443 432435423 12312 1232413r2  
Line input : [232443 432435423 12312 1232413r2
]
Parsed integer: 232443
232443
432435423
12312
1232413

324d
Line input : [324d
]
Parsed integer: 324
324

$
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Thank you very much for this great explanation. – Laith Oct 02 '20 at 18:47
  • 1
    YW. The code could eliminate the repeated `errno = 0` statements by putting the assignment before the call to `strtol()` as the LHS of another comma operator. Whether that sort of extreme compression is worth it is perhaps debatable. I'd probably wrap the code analyzing the result of `strtol()` into a function. In fact, I already did. The code is available in my [SOQ](https://github.com/jleffler/soq) (Stack Overflow Questions) repository on GitHub as files `chkstrint.c` and `chkstrint.h` in the [src/libsoq](https://github.com/jleffler/soq/tree/master/src/libsoq) sub-directory. – Jonathan Leffler Oct 02 '20 at 18:59
  • 1
    `(errno == EINVAL && value == 0)` seems redundant and beyond the specification of the `strtoX` functions. Same for `(errno == ERANGE && (value == LONG_MIN || value == LONG_MAX))`: testing if `errno` is set a non zero value seems sufficient. – chqrlie Oct 03 '20 at 12:28
  • @chqrlie — I commented in the answer about the `EINVAL` test. You can decide whether to test for `ERANGE` specifically or to go for a generic `errno != 0` test; I choose to be specific. – Jonathan Leffler Oct 03 '20 at 12:35
  • OK, I had skipped that part... so the `(errno == EINVAL && value == 0)` test is indeed redundant. Also building on the dubious packing of the `strtol` call and the test for errors, you can further combine the `errno = 0;` and `start = eon;` into this: `while ((value = strtol(start = eon, &eon, errno = 0)), eon != start && errno == 0) { ...` `:)` – chqrlie Oct 03 '20 at 12:42
  • @chqrlie — I addressed that, more or less, in my comment that starts ”YW”. And, as I said there, the call to `strtol()` and its error checking should be encapsulated in a function, and gave a link to where such code can be found. – Jonathan Leffler Oct 03 '20 at 12:49
  • What you try to do here and in another your answer is to create universal usage of strtol. But there is no silver bullet for all cases and C is about performance (therefore library functions are designed in such a way and usually skip boundary checks, etc) - the best error check is that check that can be avoided without lost of correctness. Both tests for LONG_MIN and LONG_MAX are equivalent to errno == ERANGE and this information are not relevant for summation for instance. So there is no single "correct usage". We need to modify behavior in each particular case. – Vlad Oct 03 '20 at 22:55
0

There is tradeoff between correctness and comprehensiveness. Therefore I created two versions of program:

  • first with extensive error handling
  • second is simple - assumes only positive scenario (input doesn't contain errors) to stay comprehensive.

Sequence of integers contained in C-string may be parsed invoking in loop standard C function strtol from <stdlib.h>:

long int strtol (const char* str, char** endptr, int base);

that parses the C-string str interpreting its content as an integral number of the specified base. strtol skips white spaces, interprets integer and set pointer *endptr to the first character following the integer.

Since author has variable sum in his code let's demonstrate parsing of sequence of integers as summation of this sequence. I took function sum_ints_from_string() from GNU Manual 20.11.1 Parsing of Integers Code in manual assumes positive scenario. Therefore I changed it for first version.

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
sum_ints_from_string (char *string)
{
    int sum = 0;

    while (1) {
        char *tail;
        int next;

        /* Skip whitespace by hand, to detect the end.  */
        while (string && isspace (*string)) string++;
        if (!string || *string == 0)
            break;

        /* There is more nonwhitespace,  */
        /* so it ought to be another number.  */
        errno = 0;
        /* Parse it.  */
        next = strtol (string, &tail, 0);
        /* Add it in, if possible.  */
        if (string == tail)
        {
            while (tail && !isspace (*tail)) tail++;
            printf("error: %s\n", strerror(errno));
            printf ("does not have the expected form: %s\n", string);
        }
        else if(errno == 0)
        {
            printf("%d\n", next);
            sum += next;
        }
        else
        {
            printf("error: %s\n", strerror(errno));
            printf ("error: %s\n", string);
        }
        /* Advance past it.  */
        string = tail;
    }

    return sum;
}

int main ()
{
    int sum = 0;
    size_t len = 0;
    char * line;
    FILE *f = fopen("file.txt", "w+");
    assert(f != NULL && "Error opening file");

    const char *text =  "010 0x10 -10 1111111111111111111111111111 0 30 A 10 +5 + 10 30\n"
                        "20 20B 6 ABC - 20 10 0";

    assert(fputs(text, f) > 0 && "error writing to file");
    rewind(f);
    errno = 0;
    while (getline(&line, &len, f) != -1)
    {
        sum += sum_ints_from_string(line);
        printf("%d\n", sum);

        free(line);
        line = NULL;
        len = 0;
    }
    assert(sum == 175);
    return 0;
}

Second version - positive scenario:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
sum_ints_from_string (char *string)
{
    int sum = 0;

    while (1) {
        char *tail;
        int next;

        /* Skip whitespace by hand, to detect the end.  */
        while (isspace (*string)) string++;
        if (*string == 0)
            break;

        /* There is more nonwhitespace,  */
        /* so it ought to be another number.  */
        errno = 0;
        /* Parse it.  */
        next = strtol (string, &tail, 0);
        /* Add it in, if not overflow.  */
        if (errno) // returned value is not tested in GNU original
            printf ("Overflow\n");
        else
            sum += next;
        /* Advance past it.  */
        string = tail;
    }

    return sum;
}

int main ()
{
    int sum = 0;
    size_t len = 0;
    char * line;
    while (getline(&line, &len, stdin) != -1)
    {
        sum += sum_ints_from_string(line);
        /*
    `   If line is set to NULL and len is set 0 before the call, then
        getline() will allocate a buffer for storing the line.  This buffer
        should be freed by the user program even if getline() failed.
         */
        free(line);
        line = NULL;
        len = 0;
    }
    return 0;
}

Error checking in version from GNU manual is almost skipped. According to CppReference.com strtol:

Return value

  • If successful, an integer value corresponding to the contents of str is returned.
  • If the converted value falls out of range of corresponding return type, a range error occurs (setting errno to ERANGE) and LONG_MAX, LONG_MIN, LLONG_MAX or LLONG_MIN is returned.
  • If no conversion can be performed, ​0​ is returned.

So for our purpose of summation we are interested only: whether we can add next val or not - we don't need granular and complex error checking here. We have nothing for summation and print error in case: of out of range OR strtol returns 0 (zero return value means: integer equals 0 or conversion cannot be performed). Otherwise we add next.

Vlad
  • 1,977
  • 19
  • 44
  • You should cast `char` values passed to `isspace()` to avoid undefined behavior on negative `char` values. Use `isspace((unsigned char)*string)`. – chqrlie Oct 02 '20 at 23:32
  • You didn't pay attention to preamble-I took snippet from manual and fixed only rough inconsistency. I believe that authors of GNU snippet are capable to perform ALL error handling-but they leave the code simple to avoid distraction. If one need complete information about strtol in reference style - the best reference is cppreference.com or standard. Even cppreference.com doesn't handle all cases for readability purpose-they propose simplified code in examples. Stackoverflow is not reference-usually people expect from stackoverflow even more comprehensive style. I'll not complicate the answer – Vlad Oct 03 '20 at 11:18
  • And you are wrong - if conversion is not possible then strtol will return 0 and this token should be skipped. – Vlad Oct 03 '20 at 11:22
  • *if conversion is not possible then strtol will return 0 and this token should be skipped.* in this case `strtol()` will not skip the *token*, it will just set `tail` to the initial value of `string` and return `0`, no skipping... The caller is responsible for the proper action, skip its own definition of a token, report the error, stop the parse loop... – chqrlie Oct 03 '20 at 11:58
  • You are again wrong and waist our time. Please read answer carefully - I did not "flag 0 as error". "printf ("Zero OR out of range OR does not have the expected form, \n");". Do you see? "Zero OR...". I've read only your first comment because I see this is not constructive dispute. This answer is not intended for copy-paste. It gives direction and assume that coder will change it for his input – Vlad Oct 03 '20 at 14:51
  • Not reading criticism must be a way of life. `/* Advance past it. */ string = tail;` will **not** advance past an invalid token, hence cause an infinite loop. – chqrlie Oct 03 '20 at 15:03
  • I reviewed your answer. I didn't check it but it looks good. At least you do not test what should not be tested. As in another "correct usage of strtol". I agree that my answer do not handle all incorrect inputs. The reason is the same as in the references: they try to stay comprehensive. And do not make extensive error handling - they assume only positive scenario. – Vlad Oct 03 '20 at 15:05
  • @chqrlie Ok, thanks - I downloaded VM with linux, run program and it really didn't skip large token so I added version with error handling but leaved simple version for the sake of comprehensiveness - checking hides the idea for those who new to C. Simplicity is the ultimate sophistication :) – Vlad Oct 03 '20 at 23:31
  • One final comment: `while (tail && !isspace (*tail)) tail++;` -> `while (*tail && !isspace (*tail)) tail++;` – chqrlie Oct 04 '20 at 11:16
0

As you noticed, atoi() can only be used to parse the first value on the line read by getline(), and it has other shortcomings too: if the string does not convert to an integer, the return value will be 0, which is indistinguishable from the case where the string starts with a valid representation of 0.

There are more elaborate functions in <stdlib.h> to convert integers from their representation in different bases (from 2 to 36), detect conversion errors and provide a pointer to the rest of the string: strtol, strtoul, strtoll, strtoull etc.

As noted in comments, getline() is specified as returning the number of bytes read from the file or -1 on error. Do not compare to EOF.

Here is a modified version of your code using the function strtol():

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    char *line = NULL;
    size_t len = 0; 

    while (getline(&line, &len, stdin) >= 0) {
        char *p, *end;
        long sum = 0;
    
        printf("Line input: %s\n", line);

        printf("Parsed integers:");
        for (p = line; *p != '\0'; p = end) {
            long val = strtol(p, &end, 10);
            if (end == p)
               break;
            printf(" %ld", val);
            sum += val;
        }
        printf("\nSum: %ld\n", sum);
        /* check if loop stopped on conversion error or end of string */
        p += strspn(p, " \t\r\n");  /* skip white space */
        if (*p) {
            printf("Invalid input: %s", p);
        }
    }
    free(line); 
    return 0;
}

Notes:

  • getline is not part of the C Standard, it is a POSIX extension, it might not be available on all systems or might have different semantics.
  • strtol() performs range checking: if the converted value exceeds the range of type long, the value returned is either LONG_MIN or LONG_MAX depending on the direction of the overflow and errno is set to ERANGE.
  • sum += val; can also cause an arithmetic overflow.
chqrlie
  • 131,814
  • 10
  • 121
  • 189