1

I have a txt file that its lines are as follows

[7 chars string][whitespace][5 chars string][whitespace][integer]

I want to use fscanf() to read all these into memory, and I'm confused about what format should I use.

Here's an example of such line:

hello   box   94324

Notice the filling whitespaces in each string, apart from the separating whitespace.

Edit: I know about the recommendation to use fgets() first, I cannot use it here.

Edit: here's my code

typedef struct Product {
    char* id;   //Product ID number. This is the key of the search tree.
    char* productName;  //Name of the product.
    int currentQuantity;    //How many items are there in stock, currently. 
} Product;

int main()
{
    FILE *initial_inventory_file = NULL;
    Product product = { NULL, NULL, 0 };

    //open file 
    initial_inventory_file = fopen(INITIAL_INVENTORY_FILE_NAME, "r");

    product.id = malloc(sizeof(char) * 10); //- Product ID: 9 digits exactly. (10 for null character)
    product.productName = malloc(sizeof(char) * 11); //- Product name: 10 chars exactly.

    //go through each line in inital inventory
    while (fscanf(initial_inventory_file, "%9c %10c %i", product.id, product.productName, &product.currentQuantity) != EOF)
    {
        printf("%9c %10c %i\n", product.id, product.productName, product.currentQuantity);
    }

    //cleanup...
    ...
}

Here's a file example: (it's actually 10 chars, 9 chars, and int)

022456789 box-large  1234
023356789 cart-small 1234
023456789 box        1234
985477321 dog food   2
987644421 cat food   5555
987654320 snaks      4444
987654321 crate      9999
987654322 pillows    44
McLovin
  • 3,295
  • 7
  • 32
  • 67
  • To read _lines_, better to first read the line with `fgets()` and then parse it with `sscanf(), strtol(), strtok()` etc. – chux - Reinstate Monica Jun 30 '17 at 18:48
  • I know, I have to use scanf() though. That's the assignment :/ – McLovin Jun 30 '17 at 18:49
  • "have to use scanf() " yet title is "fscanf with ...". So which is it? – chux - Reinstate Monica Jun 30 '17 at 18:49
  • yes I meant fscanf(). isn't fscanf() the same as scanf() just for file streams? – McLovin Jun 30 '17 at 18:50
  • The "7 chars string" do you mean up to 7 characters? – chux - Reinstate Monica Jun 30 '17 at 18:50
  • @chux yes. . . . – McLovin Jun 30 '17 at 18:51
  • The are subtle differences between using `fscanf()` with a file as "I have a txt file" and `scanf()` with `stdin` interactively. But that is a side issue. – chux - Reinstate Monica Jun 30 '17 at 18:52
  • `char first[8] = {0}, second[6] = {0}; int integer = 0; fscanf(fp, "%7c %5c %i", first, second, &integer);` – BLUEPIXY Jun 30 '17 at 18:53
  • 1
    @BLUEPIXY has a good [comment](https://stackoverflow.com/questions/44853055/fscanf-with-whitespaces-as-separators-what-format-should-i-use#comment76687070_44853055) - be sure to check the return value of `fscanf()`. – chux - Reinstate Monica Jun 30 '17 at 18:54
  • 1
    When you say you have to use f/scanf, that you can't use fgets, I assume that this is an assignment in a class you're taking. If so, do yourself a favor and forget this lesson just as soon as it's over. In real-world C programming, essentially *nobody* uses `scanf` or `fscanf` for anything. They're practically useless. I'd say learning them is a complete waste of time, but I can't influence your instructor. – Steve Summit Jun 30 '17 at 18:54
  • @SteveSummit thanks for the tip – McLovin Jun 30 '17 at 18:55
  • @BLUEPIXY Doesn't work.. I get invalid values – McLovin Jun 30 '17 at 19:02
  • 1
    A key issue about `fgets()`, troubles with `fscanf()` and post like these is one of error handling. What should code do when the input is _unexpected_? This post centers on how to read expected input, but does not address if the input is _maybe_ ok with `"hello \n box 94324\n"`, `"hello (many space) box 94324\n"`, `"hello \n box\n"` or `"hello \n box XYZ\n"`. Good luck. – chux - Reinstate Monica Jun 30 '17 at 19:02
  • "I get invalid values" lacks clarity and information which should have been originally in the post.. @BLUEPIXY [code](https://stackoverflow.com/questions/44853055/fscanf-with-whitespaces-as-separators-what-format-should-i-use#comment76687070_44853055) would certainly work with `"hello box 94324\n"`. – chux - Reinstate Monica Jun 30 '17 at 19:04
  • @Pilpel Are you initializing arguments? Or have you set a null terminator? Or read more than one row? provide [mcve] – BLUEPIXY Jun 30 '17 at 19:05
  • `malloc` doesn't initilized contents. and Use `%s` of `printf` insted of `%c` fix like [this](http://ideone.com/z5XaCk) – BLUEPIXY Jun 30 '17 at 19:19
  • @SteveSummit-- I would suggest that it is worth learning `fscanf()` if only to be able to use `sscanf()`, which can be useful when used with `fgets()`. Or, does your (not unjustified) dislike of the `scanf()` family extend to `sscanf()` too? – ad absurdum Jun 30 '17 at 19:20
  • 1
    @DavidBowling Aside: "worth learning fscanf() if only to be able to use sscanf()" is less direct than "learning the useful function `sscanf()`" and then _maybe_ learning `fscanf()` or skipping the latter. `(f)scanf()` need not be considered a `sscanf()` prerequisite. – chux - Reinstate Monica Jun 30 '17 at 19:38
  • the expression: `sizeof(char)` is defined in the standard as 1. Multiplying anything by 1 has absolutely no effect. Using that expression in the parameter list to `malloc()` just clutters the code, making it more difficult to understand, debug, etc. – user3629249 Jul 02 '17 at 04:32
  • this format string, for `fscanf()`, "%9c %10c %i" will input exactly 9 characters into the first array, skip any white space, input exactly 10 characters into the second array, skip any white space, then input/convert an integer. Note: the array values will NOT be NUL terminated. Strongly suggest using the '%s' input format specifier, which will NUL terminate the array. I.E. `fscanf(initial_inventory_file, "%9s %10s %i", product.id, product.productName, &product.currentQuantity) != EOF)` – user3629249 Jul 02 '17 at 04:38
  • when calling any of the heap allocation functions: (malloc, calloc, realloc), always check (!=NULL) the returned value to assure the operation was successful. – user3629249 Jul 02 '17 at 04:42
  • 1
    when calling any of the `scanf()` family of functions, always check the returned value (not the parameter values) to assure the operation was successful. In the current scenario, the returned value must be 3, otherwise some error occurred. – user3629249 Jul 02 '17 at 04:44
  • @DavidBowling No, `sscanf` is not as bad as `scanf` and `fscanf`. `sscanf` is occasionally useful, and certainly worth learning. At least 90% of the time, though, I'll break a line up using the moral equivalent of `strtok`, and then work with the pieces; that usually strikes a much better balance (for me, at least) between convenience and robustness than does `sscanf`. – Steve Summit Jul 04 '17 at 13:33

4 Answers4

3

Assuming your input file is well-formed, this is the most straightforward version:

char str1[8] = {0};
char str2[6] = {0};
int  val;
...
int result = fscanf( input, "%7s %5s %d", str1, str2, &val );

If result is equal to 3, you successfully read all three inputs. If it's less than 3 but not EOF, then you had a matching failure on one or more of your inputs. If it's EOF, you've either hit the end of the file or there was an input error; use feof( input ) to test for EOF at that point.

If you can't guarantee your input file is well-formed (which most of us can't), you're better off reading in the entire line as text and parsing it yourself. You said you can't use fgets, but there's a way to do it with fscanf:

char buffer[128]; // or whatever size you think would be appropriate to read a line at a time

/**
 * " %127[^\n]" tells scanf to skip over leading whitespace, then read
 * up to 127 characters or until it sees a newline character, whichever
 * comes first; the newline character is left in the input stream.
 */
if ( fscanf( input, " %127[^\n]", buffer ) == 1 )
{
  // process buffer
}

You can then parse the input buffer using sscanf:

int result = sscanf( buffer, "%7s %5s %d", str1, str2, &val );
if ( result == 3 )
{
  // process inputs
}
else
{
  // handle input error
}

or by some other method.

EDIT

Edge cases to watch out for:

  1. Missing one or more inputs per line
  2. Malformed input (such as non-numeric text in the integer field)
  3. More than one set of inputs per line
  4. Strings that are longer than 7 or 5 characters
  5. Value too large to store in an int

EDIT 2

The reason most of us don't recommend fscanf is because it sometimes makes error detection and recovery difficult. For example, suppose you have the input records

foo     bar    123r4
blurga  blah   5678

and you read it with fscanf( input, "%7s %5s %d", str1, str2, &val );. fscanf will read 123 and assign it to val, leaving r4 in the input stream. On the next call, r4 will get assigned to str1, blurga will get assigned to str2, and you'll get a matching failure on blah. Ideally you'd like to reject the whole first record, but by the time you know there's a problem it's too late.

If you read it as a string first, you can parse and check each field, and if any of them are bad, you can reject the whole thing.

John Bode
  • 119,563
  • 19
  • 122
  • 198
  • From time to time, I would like to double UV an answer like this. Of course there is the bounty thing.... (only weakness is `" %127[^\n]"` can consume more than 1 line and leaves the \n in the stream - yet a good quick fix for irrational `fgets()`-less code requirement.) – chux - Reinstate Monica Jun 30 '17 at 19:43
  • @chux: Yeah, that's an issue. The thing is, the previous call leaves the newline in the stream, so it has to be dealt with somehow. I suppose I could add a `%c` to consume the newline itself. But that just drives the point home even more that `scanf` is the wrong tool. – John Bode Jun 30 '17 at 20:36
1

The issue in your code using the "%9c ..."-format is that %9c does not write the string terminating character. So your string is probably filled with garbage and not terminated at all, which leads to undefined behaviour when printing it out using printf.

If you set the complete content of the strings to 0 before the first scan, it should work as intended. To achieve this, you can use calloc instead of malloc; this will initialise the memory with 0.

Note that the code also has to somehow consumes the newline character, which is solved by an additional fscanf(f,"%*c")-statement (the * indicates that the value is consumed, but not stored to a variable). Will work only if there are no other white spaces between the last digit and the newline character:

int main()
{
    FILE *initial_inventory_file = NULL;
    Product product = { NULL, NULL, 0 };

    //open file
    initial_inventory_file = fopen(INITIAL_INVENTORY_FILE_NAME, "r");

    product.id = calloc(sizeof(char), 10); //- Product ID: 9 digits exactly. (10 for null character)
    product.productName = calloc(sizeof(char), 11); //- Product name: 10 chars exactly.

    //go through each line in inital inventory
    while (fscanf(initial_inventory_file, "%9c %10c %i", product.id, product.productName, &product.currentQuantity) == 3)
    {
        printf("%9s %10s %i\n", product.id, product.productName, product.currentQuantity);
        fscanf(initial_inventory_file,"%*c");
    }

    //cleanup...
}
Stephan Lechner
  • 34,891
  • 4
  • 35
  • 58
  • Why not use leading whitespace in the `fscanf()` format string instead of trailing whitespace? The trailing whitespace will not work as expected in `scanf()` (blocking input), and may confuse learners, while the leading whitespace works for both. – ad absurdum Jun 30 '17 at 19:52
  • @David Bowling: Its because I assumed that a string like `" llo .... "` should be read in as `"__llo"`, and not as `"llo"`. That's why I cannot use a leading WS. – Stephan Lechner Jun 30 '17 at 19:57
  • Hmmm. I understand your point, and this is one of the few times I have seen trailing whitespace in a format string that works; yet OP code has a comment that says: `//- Product ID: 9 digits exactly.` I don't think this provision is necessary, unless I missed something else in the question or comments. – ad absurdum Jun 30 '17 at 20:02
  • Also, I think that this may not work as expected if the `product.id` entry has leading spaces; these would be skipped over by the whitespace directive from the end of the previous call to `fscanf()`, which will match whitespace until a non-whitespace character is reached. – ad absurdum Jun 30 '17 at 20:06
  • @David Bowling: "9 digits exactly" actually is the point; a leading WS in the format would skip leading white spaces (while probably belonging to the id) and would then mess up the rest of the line read in. – Stephan Lechner Jun 30 '17 at 20:06
  • @David Bowling: you are right with your second comment; Hmm. need to think about it. – Stephan Lechner Jun 30 '17 at 20:06
  • 1
    Didn't someone say they hated the `scanf()` functions? – ad absurdum Jun 30 '17 at 20:07
  • @David Bowling: really annoying. Added a "fix" to an answer that I do not delete just because to show how clumsy the `scanf`-approach actually is... – Stephan Lechner Jun 30 '17 at 20:15
  • What about: `fscanf(initial_inventory_file, "%9c %10c %i%*c",...`. Still only works when `\n` is the only character after the last input character, but does not require the second call to `fscanf()`. – ad absurdum Jun 30 '17 at 20:27
  • David Bowling: yes, but could lead to a problem if the last line of the file is not terminated by a new line, right? – Stephan Lechner Jun 30 '17 at 20:28
  • I _think_ that would be OK, just a matching failure on the ignored character, causing `fscanf()` to return, still with a return value of 3. UV, whatever you decide here.... – ad absurdum Jun 30 '17 at 20:30
  • 1
    @David Bowling: I tried `"%i%*c"`, and it worked (also for the last line of the file without a new line) – Stephan Lechner Jun 30 '17 at 20:35
1

Let's assume the input is

<LWS>* <first> <LWS>+ <second> <LWS>+ <integer>

where <LWS> is any whitespace character, including newlines; <first> has one to seven non-whitespace characters; <second> has one to five non-wihitespace characters; <integer> is an optionally signed integer (in hexadecimal if it begins with 0x or 0X, in octal if it begins with 0, or in decimal otherwise); * indicates zero or more of the preceding element; and + indicates one or more of the preceding element.

Let's say you have a structure,

struct record {
    char first[8];  /* 7 characters + end-of-string '\0' */
    char second[6]; /* 5 characters + end-of-string '\0' */
    int  number;
};

then you can read the next record from stream in into the structure pointed to by the caller using e.g.

#include <stdlib.h>
#include <stdio.h>

/* Read a record from stream 'in' into *'rec'.
   Returns: 0 if success
           -1 if invalid parameters
           -2 if read error
           -3 if non-conforming format
           -4 if bug in function
           +1 if end of stream (and no data read)
*/
int read_record(FILE *in, struct record *rec)
{
    int rc;

    /* Invalid parameters? */
    if (!in || !rec)
        return -1;

    /* Try scanning the record. */
    rc = fscanf(in, " %7s %5s %d", rec->first, rec->second, &(rec->number));

    /* All three fields converted correctly? */
    if (rc == 3)
        return 0; /* Success! */

    /* Only partially converted? */
    if (rc > 0)
        return -3;

    /* Read error? */
    if (ferror(in))
        return -2;

    /* End of input encountered? */
    if (feof(in))
        return +1;

    /* Must be a bug somewhere above. */
    return -4;
}

The conversion specifier %7s converts up to seven non-whitespace characters, and %5s up to five; the array (or char pointer) must have room for an additional end-of-string nul byte, '\0', which the scanf() family of functions add automatically.

If you do not specify the length limit, and use %s, the input can overrun the specified buffer. This is a common cause for the common buffer overflow bug.

The return value from the scanf() family of functions is the number of successful conversions (possibly 0), or EOF if an error occurs. Above, we need three conversions to fully scan a record. If we scan just 1 or 2, we have a partial record. Otherwise, we check if a stream error occurred, by checking ferror(). (Note that you want to check ferror() before feof(), because an error condition may also set feof().) If not, we check if the scanning function encountered end-of-stream before anything was converted, using feof().

If none of the above cases were met, then the scanning function returned zero or negative without neither ferror() or feof() returning true. Because the scanning pattern starts with (whitespace and) a conversion specifier, it should never return zero. The only nonpositive return value from the scanf() family of functions is EOF, which should cause feof() to return true. So, if none of the above cases were met, there must be a bug in the code, triggered by some odd corner case in the input.

A program that reads structures from some stream into a dynamically allocated buffer typically implements the following pseudocode:

Set ptr = NULL  # Dynamically allocated array
Set num = 0     # Number of entries in array
Set max = 0     # Number of entries allocated for in array

Loop:

    If (num >= max):
        Calculate new max; num + 1 or larger
        Reallocate ptr
        If reallocation failed:
            Report out of memory
            Abort program
        End if
    End if

    rc = read_record(stream, ptr + num)
    If rc == 1:
        Break out of loop
    Else if rc != 0:
        Report error (based on rc)
        Abort program
    End if
End Loop
Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
0

Have you tried the format specifiers?

char seven[8] = {0};
char five[6] = {0};
int myInt = 0;

// loop here
fscanf(fp, "%s %s %d", seven, five, &myInt);
// save to structure / do whatever you want

If you're sure that the formatting and strings are the always fixed length, you could also iterate over input character by character (using something like fgetc() and manually process it. The example above could cause segmentation errors if the string in the file exceeds 5 or 7 characters.

EDIT Manual Scanning Loop:

char seven[8] = {0};
char five[6] = {0};
int myInt = 0;

// loop this part
for (int i = 0; i < 7; i++) {
    seven[i] = fgetc(fp);
}
assert(fgetc(fp) == ' '); // consume space (could also use without assert)
for (int i = 0; i < 5; i++) {
    five[i] = fgetc(fp);
}
assert(fgetc(fp) == ' '); // consume space (could also use without assert)
fscanf(fp, "%d", &myInt);
Felix Guo
  • 2,700
  • 14
  • 20
  • I tried the format you wrote. It causes problems when there are whitespaces as "character fillers" in the line – McLovin Jun 30 '17 at 18:53
  • If the format is always 7 characters then 5, I would suggest a manual character scanning loop. – Felix Guo Jun 30 '17 at 18:55
  • I have to use fscanf() my friend, and I cannot use fgetc() – McLovin Jun 30 '17 at 19:02
  • You can combine this: https://stackoverflow.com/questions/12306591/read-no-more-than-size-of-string-with-scanf to limit how many characters read, and this trick: https://stackoverflow.com/questions/1950057/can-fscanf-read-whitespace to exclude whitespace from delimiters. – Felix Guo Jun 30 '17 at 19:13