1

Here is a part of my code. The aim of gets and sscanf is to scan three variables separated by exactly one space. If passed, then output the instruction again. Otherwise, output error and exit program.

I want to use 7 length char array to limit the number in the line, getting format like 'g 3 3' only. But it seems something wrong in my code.

#include <stdio.h> 

int main (void) {
    char line[7];
    char command;
    int x, y;

    while(1){
        /* problem: g  4 4 or g 4  4 can also pass */
        fgets(line, 7, stdin);
        nargs = sscanf(line, "\n%c %d %d", &command, &x, &y);

        if(nargs != 3){
          printf("error\n");
          return 0;
        }

        printf("%c %d %d\n", command, x, y);
    }
}

Unexpected:

g  4 4
g 4 4
error

expected:

g 4 4
g 4 4
// I can continue type

Can anyone tell me why it will still repeat the instruction?

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Jennifer Q
  • 257
  • 3
  • 12
  • When using `sscanf`, a space or newline in the format string matches zero or more whitespace characters in the input. The string `g 4 4` fits into the 7 byte buffer, and matches the format, so it passes. – user3386109 Mar 26 '16 at 02:47
  • 2
    `g 4 4` => `"g 4 4\n"` => fgets => `"g 4 4\0"` => next fgets `"\n"` => sscanf return 0; To increase the line buffer. – BLUEPIXY Mar 26 '16 at 02:49
  • yep, `g 4 4` should pass but `g 4 4`(two spaces between g and 4) should fail. And when I change the limitation to 6 characters, even `g 4 4` will fail. I am wonder the reason – Jennifer Q Mar 26 '16 at 02:51
  • 1) want to input the whole line, regardless of any extra spaces. so increase the length of the input buffer a lot, suggest 50 bytes. 2) in the call to fgets() use `sizeof( line)` so only need to edit one place. 3) use a `#define` statement to set the `line[]` length 4) rather than `sscanf()`, iterate through the `line[]` array to check the format, because `sscanf()` will not fail due to extra spaces. – user3629249 Mar 26 '16 at 18:28

4 Answers4

2

According to the C11 standard, 7.21.6.2p5:

A directive composed of white-space character(s) is executed by reading input up to the first non-white-space character (which remains unread), or until no more characters can be read.

This describes the \n directive and the two space characters as being identical in functionality: They'll match as much consecutive white-space (spaces, tabs, newlines, etc) as they can from the input.

If you want to match a single space (and only a single space), I suggest using %*1[ ] instead of the white-space directives. You could use %*1[\n] to similarly discard a newline. For example, since the newline character appears at the end of a line:

nargs = sscanf(line, "%c%*1[ ]%d%*1[ ]%d%*1[\n]", &command, &x, &y);

This won't completely solve your problem, unfortunately, as the %d format specifier is also defined to discard white-space characters:

Input white-space characters (as specified by the isspace function) are skipped, unless the specification includes a [, c, or n specifier

With some clever hacks, you might be able to continue using sscanf (or better yet, scanf without the intermediate buffer), but after comparing the alternatives in terms of cost on maintainability, we might as well just use getchar, so if you're looking for a solution to your problem as opposed to an answer to the question you posed, I'd recommend gsamaras answer.

Community
  • 1
  • 1
autistic
  • 1
  • 3
  • 35
  • 80
  • A `sscanf()` that completely solves the problem is [doable](http://stackoverflow.com/a/36231552/2410359) and need not be denigrated as a clever hack. Notice OP's comment ["we are restricted to use sscanf only"](http://stackoverflow.com/questions/36231014/unexpected-repitition-using-fgets-and-sscanf/36231552?noredirect=1#comment60093557_36231185) – chux - Reinstate Monica Mar 26 '16 at 14:41
1

What you have there won't work, since sscanf() won't be bothered if the user inputs one or two whitespaces.

You could approach this in a simple way, by taking advantage of short circuiting and by using getchar(), like this:

#include <stdio.h>
#include <ctype.h>

#define SIZE 100

int main(void) {
    int c, i = 0;
    char line[SIZE] = {0};
    while ((c = getchar()) != EOF) {
        // is the first char an actual character?
        if(i == 0 && !isalpha(c)) {
                printf("error\n");
                return -1;
        // do I have two whitespaces in 2nd and 4th position?
        } else if((i == 1 || i == 3) && c != ' ') {
                printf("error\n");
                return -1;
        // do I have digits in 3rd and 5th position?
        } else if((i == 2 || i == 4) && !isdigit(c)) {
                printf("error\n");
                return -1;
        // I expect that the user hits enter after inputing his command
        } else if(i == 5 && c != '\n') {
                printf("error\n");
                return -1;
        // everything went fine, I am done with the input, print it
        } else if(i == 5) {
                printf("%s\n", line);
        }
        line[i++] = c;
        if(i == 6)
                i = 0;
    }
    return 0;
}

Output:

gsamaras@gsamaras:~$ gcc -Wall px.c
gsamaras@gsamaras:~$ ./a.out 
g 4 4
g 4 4
g  4 4
error
Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • It's great! But I still want to know is there anyway to fix it in sscanf? Cuz we are restricted to use sscanf only – Jennifer Q Mar 26 '16 at 03:20
  • @JenniferQ good. I do not know any, but I upvoted your question, which may bring more people in the question, good luck. – gsamaras Mar 26 '16 at 03:22
  • @JenniferQ Unfortunately, you can't prevent `%d` from discarding prefixed white-space. – autistic Mar 26 '16 at 03:45
  • I have to agree with @Seb, who by the way threw a great answer. – gsamaras Mar 26 '16 at 04:02
  • After careful analysis this turned out to be a nice answer. It took a little while for me to decipher the seeming canyon between the requirements of the OPs code and the requirements of this code. Don't get me wrong, that's not your fault, but I think you could gently nudge your answer towards the best one here by touching on them. As one example: "The array being only 7 characters implies that each decimal-digit field is exactly one digit however `%d` violates this by allowing more than one digit (and potentially a negative sign). I assume those fields are only supposed to be one digit each." – autistic Mar 26 '16 at 05:19
  • @Seb thanks, I did update my answer with a link to this: http://stackoverflow.com/questions/26716255/why-does-this-program-print-forked-4-times/26716300#26716300, but I think it is stable now, after all, yours is better. – gsamaras Mar 26 '16 at 15:04
  • @gsamaras My answer is a direct answer to the question posed ("Why is this happening?") and nothing more. You've gone the extra step and posed a solution to the problem (i.e. an answer to "How do I solve it?"), which is probably what the OP meant to ask, and certainly what the OP is likely to ask next. I couldn't be bothered... – autistic Mar 27 '16 at 01:04
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/107440/discussion-between-gsamaras-and-seb). – gsamaras Mar 27 '16 at 01:06
1

Can anyone tell me why it will still repeat the instruction?

The tricky part is that "%d" consumes leading white-space, so code needs to detect leading white-space first.

" " consumes 0 or more white-space and never fails.

So "\n%c %d %d" does not well detect the number of intervening spaces.


If the ints can be more than 1 character, use this, else see below simplification.

Use "%n to detect location in the buffer of sscanf() progress.

It gets the job done using sscanf() which apparently is required.

// No need for a tiny buffer
char line[80];
if (fgets(line, sizeof line, stdin) == NULL) Handle_EOF();

int n[6];
n[5] = 0;
#define SPACE1 "%n%*1[ ] %n"
#define EOL1   "%n%*1[\n] %n"

// Return value not checked as following `if()` is sufficient to detect scan completion.
// See below comments for details
sscanf(line, "%c" SPACE1 "%d" SPACE1 "%d" EOL1, 
  &command, &n[0], &n[1],
  &x,       &n[2], &n[3],
  &y,       &n[4], &n[5]);

// If scan completed to the end with no extra
if (n[5] && line[n[5]] == '\0') {
  // Only 1 character between?
  if ((n[1] - n[0]) == 1 && (n[3] - n[2]) == 1 && (n[5] - n[4]) == 1) {
    Success(command, x, y);
  }
}

Maybe add test to insure command is not a whitespace, but I think that will happen anyway in command processing.


A simplification can be had if the ints must only be 1 digit and with a mod combining @Seb answer with the above. This works because the length of each field is fixed in an acceptable answer.

// Scan 1 and only 1 space
#define SPACE1 "%*1[ ]"

int n = 0;
// Return value not checked as following `if()` is sufficient to detect scan completion.
sscanf(line, "%c" SPACE1 "%d" SPACE1 "%d" "%n", &command, &x, &y, &n);

// Adjust this to accept a final \n or not as desired.
if ((n == 5 && (line[n] == '\n' || line[n] == '\0')) {
  Success(command, x, y);
}

@Seb and I dove into the need for checking the return value of sscanf(). Although the cnt == 3 test is redundant since n == 5 will only be true when then entire line was scanned and sscanf() returns 3, a number of code checkers may raise a flag noting that the results of sscanf() is not checked. Not qualifying the results of sscanf() before using the saved variables is not robust code. This approach uses a simple and sufficient check of n == 5. Since many code problems stem from not doing any qualification, the lack of the check of the sscanf() can raise a false-positive amongst code checkers. Easy enough to add the redundant check.

// sscanf(line, "%c" SPACE1 "%d" SPACE1 "%d" "%n", &command, &x, &y, &n);
// if (n == 5 && (line[n] == '\n' || line[n] == '\0')) {
int cnt = sscanf(line, "%c" SPACE1 "%d" SPACE1 "%d" "%n", &command, &x, &y, &n);
if (cnt == 3 && n == 5 && (line[n] == '\n' || line[n] == '\0')) {
Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 1
    Nice work — saves me having to work out the nitty gritty details. – Jonathan Leffler Mar 26 '16 at 04:45
  • @Seb This answer does **not** have the problem you describe. Notice the `n[5] = 0; sscanf(.....%n, ... &n[5]); if (n[5] ...`. Sorry if this was not plain enough. `n[5]` will only have a non-zero value if the parsing reached the last `"%n"`. That can only happen when the return value of `sscanf()` is 3,in this case. Checking for return value by itself is insufficient, as you suggest, without also checking `n[5]` as that does not detect text after the last `"%d"`. – chux - Reinstate Monica Mar 26 '16 at 14:22
  • @Seb Changing `SPACE1` to `"%n %n"` as you suggest is **not** an improvement over `%n%*1[ ] %n"`. OP wants to detect "separated by exactly one space". ``"%n %n"`` would be useful to detect 1 _white-space_ as the `" "` in your suggested format accepts any white space characters and `%*1[ ] "` will only accept a space followed by white-spaces. Which the following `n[1] - n[0]) == 1` insures only 1 (the space) was scanned. – chux - Reinstate Monica Mar 26 '16 at 14:30
  • @Seb, Your test case is not this code and is a problem as it does not check any of the results of `sscanf()` before using `printf("x: %d...`. Just like your code `nargs = sscanf(line, "%c%*1[ ]%d%*1[ ]%d%*1[\n]", &command, &x, &y);` followed by `printf()` would also be a problem. This code does not use `command, x, ` until testing a result of `scanf()` which `if (n[5]` checks and _then_ uses `Success(command, x, y);`. I agree with I think it would be a good idea to a) keep an open mind and b) learn how to use sscanf (properly) and applies to that _all_. – chux - Reinstate Monica Mar 27 '16 at 01:46
  • @Seb problem with `%*1[ ]` is the same with `"%d"`. IAC, this answers does not use any of `command, x, y` until all are known to be valid, even if is done in compliant fashion some are not familiar with. – chux - Reinstate Monica Mar 27 '16 at 01:48
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/107445/discussion-between-chux-and-seb). – chux - Reinstate Monica Mar 27 '16 at 02:23
  • @seb It is not "rhetorics". This answer _does not use an indeterminate value_. `Success(command, x, y);` is not called until `n[5]` is tested to be non-zero. That can only happen when `sscanf()` returned 3. So if code tests the return value of not, is irrelevant. This paradigm may be unfamiliar too you, using a final `"%n"` in a format, but has been successfully employed for decades. No luck needed. It works with no UB. – chux - Reinstate Monica Mar 27 '16 at 04:25
  • @chux My bad. Indeed, the `if (n[5]` was not plain enough. You're right; this code doesn't have *that* problem... It has *another* problem, which is that it raises false positives during the QA phase (as it has in this instance). It's simple to check the return value explicitly. For example: `int nargs = sscanf(line, "%c" SPACE1 "%d" SPACE1 "%d" EOL1, &command, &n[0], &n[1], &x, &n[2], &n[3], &y, &n[4], &n[5]); if (nargs == 3 && ...` for your first, or `int nargs = sscanf(line, "%c" SPACE1 "%d" SPACE1 "%d" "%n", &command, &x, &y, &n); if (nargs == 3 && ...` for your second. – autistic Mar 27 '16 at 04:36
  • I suppose the biggest consequence of this is that [some compilers might (at times improperly) reject it out-right](http://stackoverflow.com/questions/10043841/c-error-ignoring-return-value-of-scanf), which is intended to raise those very QA flags. Sometimes [this rejection is invalid](http://stackoverflow.com/questions/28357707/ignore-return-value-of-a-gobble-only-scanf-gcc), but it's easy enough to avoid in this case, so why not just take the steps to avoid it? Oh, while we're on the topic of compiler errors, you have one here: `buf` undeclared... Did you mean to write `line`? – autistic Mar 27 '16 at 04:59
  • @seb Your [comment](http://stackoverflow.com/questions/36231014/unexpected-repitition-using-fgets-and-sscanf/36231552?noredirect=1#comment60118133_36231552) raises a valid point - even though it references `scanf()`. This issues of `sscanf()` and `scanf()` have obvious similarities and subtle differences. (Had this been a `scanf()` post, this answer would have saved the return value, at least for detecting EOF/Input error). IAC, answer appended and thank-you for your professionalism. – chux - Reinstate Monica Mar 27 '16 at 05:38
0

you have a problem with program ? gdb is your best friend =)

gcc -g yourProgram.c
gdb ./a.out
break fgets
run
finish
g 4  4

and then step through the statements, whenever you encounter scanf or printf just type finish, what you will see is that the program completed this iteration successfully but then the program did not wait for input and just printed error message ? why ? well type :

man fgets

fgets reads at most ONE LESS than size, so in your case, fgets is only allowed to read 6 characters, but you gave it 7! Yes the newline is a character just like the space, so what happens to the 7th ? it will be buffered, which means that instead of reading from the keyboard, your program will see that there are characters in the buffer and will use them( one character in this example ). Edit : Here is what you can do to make your program work
you can ignore empty lines, if ( strccmp(line, "\n") == 0 ) then jump to the next iteration, and if you are not allowed to use strcmp a workaround would be comparing line[0]=='\n'.

Baroudi Safwen
  • 803
  • 6
  • 17
  • What if `stdin` is defined as an unbuffered stream? Will it still be buffered, then? What will happen in that case? – autistic Mar 26 '16 at 03:29
  • stdin is line buffered in linux and if you want to get around that you will have to play with the terminal not the stdio, as for files, reading will be character by character, you cannot read more than one character. – Baroudi Safwen Mar 26 '16 at 09:47
  • *Hmmm, I must have missed the linux tag in this question...* – autistic Mar 26 '16 at 12:10
  • The C standard says ["What constitutes an interactive device is implementation-defined."](http://port70.net/~nsz/c/c11/n1570.html#5.1.2.3p7) While it may be valid to say for your Linux system that `stdin` is line-buffered (it probably isn't; more on that later) that doesn't hold for all. Furthermore, if you pipe a file in as `stdin` you'll find it's probably *not* line buffered, but *fully buffered* instead. For more information on those terms, and how you can (sometimes) change that (without touching the terminal) I suggest reading [this](http://port70.net/~nsz/c/c11/n1570.html#7.21.3p3). – autistic Mar 26 '16 at 12:18
  • yes it is, Advanced Programming in the Unix Environment, third edition page 146. I am aware of the buffer setting functions, but they don't work on stdin, you can try it if you want to. for pipes and files, they are not interactive devices, and so they are fully buffered. http://stackoverflow.com/questions/10247591/setvbuf-not-able-to-make-stdin-unbuffered – Baroudi Safwen Mar 26 '16 at 12:33
  • Don't you think it's strange to cite from a book that's non-authoritarian, or a question that's not only non-authoritarian but even less credible, in order to refute the very authority that decides what is and isn't C? I linked to the C standard... You're not arguing with me. You're arguing with them, and they're the people who tell your compiler developers what the language looks like and how it should function. Go and argue with them... – autistic Mar 26 '16 at 12:37
  • One more thing, if your book claims anything other than [what this opengroup manual says](http://pubs.opengroup.org/onlinepubs/9699919799/functions/stdin.html) then the title of your book is quite misleading; I would throw it in the fire and get a new one because it's probably lying to you about both the C and the POSIX standards. – autistic Mar 26 '16 at 12:40
  • the C standard does not specify buffering characteristics for standard input/output, unix and windows both default to line buffered. – Baroudi Safwen Mar 26 '16 at 12:41
  • i cannot continue arguing with someone who does not know Richard Stevens really? check this http://stackoverflow.com/questions/3723795/is-stdout-line-buffered-unbuffered-or-indeterminate-by-default – Baroudi Safwen Mar 26 '16 at 12:43
  • No, you can't mention Unix or Windows here because there are no Unix or Windows tags here, and C doesn't require that either of those be the hosts. Additionally, the Open Group specifies what Unix constitutes (by specifying what POSIX constitutes), not Richard Stevens; he merely commentates on it, and if he says `stdin` is line-buffered then as I mentioned earlier [his assertions are invalid and irrelevant](http://pubs.opengroup.org/onlinepubs/9699919799/functions/stdin.html), and you should burn his books and get some new ones. – autistic Mar 26 '16 at 23:38