9

Problem: I need to be able identify when two whitespaces occur consecutively.

I have read the following questions:

how to read a string from a \n delimited file

how to read scanf with spaces

And I am aware of scanf problems: http://c-faq.com/stdio/scanfprobs.html

Input will be in the following format:

1 5 3 2  4 6 2  1 9  0

Two white spaces indicates that the next set of data needs to be handle and compared to itself. The length of the line is unknown and the number or integers in each group is unknown. Two whitespaces is the most that will separate the next data set.

While I can use fgets and various built in functions to solve this problem, I am at the point where solving the problem with scanf at this point will likely be easier. However, if that's not the case, using fgets, strtok and atoi will do most of the job but I still need to identify two whitespaces in a row.

The below will take integers until a non-integer is inputed.

while ( scanf ( "%d", &x ) == 1 )

What I need it do is read whitespaces as well and if there is two consecutive whitespaces I'll the program to do something different with the next set of data.

And once I do get a white space I don't know how to say:

if ((input == "whitespace") && (previousInput == "whitespace"))
  ya da ya da
else (input == "whitespace")
  ya da ya da
else 
  ya da ya da

I appreciate your time and thank you for your help.

Lesson learned: While a solution for scanf is posted below by Jonathan Leffler, the solution was a bit more straightforward with getc (by way of requiring less intimate knowledge of the inner scanf, regular expressions and char). In retrospect better knowledge of regular expressions, scanf and char would of made the problem easier and of course knowing what functions are available and which one would have been the best one to use from the start.

Community
  • 1
  • 1
MykC
  • 188
  • 1
  • 3
  • 13
  • 2
    That's a pretty ghastly input format. If you're in charge of it, redesign it. If, as I suspect, you have been given a homework assignment, bad luck - they're a sadistic bunch, your teachers. – Jonathan Leffler Sep 21 '10 at 22:55
  • 3
    Note that 'white space' is different from 'two spaces'; 'white space' conventionally means a variety of possible characters, including tab and blank (or space), and sometimes form feed, vertical tab or newline; and occasionally backspace too. – Jonathan Leffler Sep 21 '10 at 23:01
  • @Jonathan Leffler: at least he's not trying to parse Whitespace ( http://compsoc.dur.ac.uk/whitespace/ ) – ninjalj Sep 22 '10 at 18:05
  • @ninjalj: Interesting! You're probably aware of [Stroustrup's](http://www2.research.att.com/~bs/whitespace.html) offering in this area! At least this question is just C, not C++ too. – Jonathan Leffler Sep 22 '10 at 20:51

5 Answers5

5

getc and ungetc are your friends

#include <stdio.h>

int main(void) {
  int ch, spaces, x;
  while (1) {
    spaces = 0;
    while (((ch = getc(stdin)) != EOF) && (ch == ' ')) spaces++;
    if (ch == EOF) break;
    ungetc(ch, stdin);
    if (scanf("%d", &x) != 1) break;
    printf("%d was preceded by %d spaces\n", x, spaces);
  }
  return 0;
}

Demo at http://ideone.com/xipm1

Edit Rahhhhhhhhh ... I uploaded that as C++. Here's the exact same thing, but now C99 strict( http://ideone.com/mGeVk )

pmg
  • 106,608
  • 13
  • 126
  • 198
  • scanf, sscanf, fscanf, fgets, gets, getc... lol so many options. I'll have to read up on getc and ungetc. Thank you for the reply. – MykC Sep 21 '10 at 23:03
  • +1 because `getc()` and `ungetc()` are a better way to do it than trying to use just `scanf()` - but it evades the question a bit. – Jonathan Leffler Sep 21 '10 at 23:03
  • 4
    @MykC: **No, NOT gets! DONT EVER USE gets, NEVER** – pmg Sep 21 '10 at 23:08
  • Yeah, gets is bad. I haven't looked into it but getc() is different than gets and isn't bad? – MykC Sep 21 '10 at 23:10
  • `getc` is good. Just remember it returns an `int` (not a `char`) and you can't go wrong :) – pmg Sep 21 '10 at 23:15
  • @MykC: the problem with gets() is that it's _always_ a security vulnerability using it with untrusted input. – ninjalj Sep 22 '10 at 18:07
1
while ( scanf ( "%c", &x ) == 1 )

Using %c you can read whitespace characters, you must only read all data and store in array. Then allocate char* cptr and get set cptr to begin of array, then you analyze your array and if you want read decimal numbers, you can use simply sscanf on cptr while you want read decimal, but you must have pointer in good position on array (on number what you wany read)

if (((*(cptr + 1)) == ' ') && ((*cptr)== ' '))
  ya da ya da
else ((*cptr)== ' '))
  ya da ya da
  sscanf(++cptr, "%d", &x);
else 
  ya da ya da
Svisstack
  • 16,203
  • 6
  • 66
  • 100
  • Looks good. I avoid using pointers and arrays if I can. Note: I will use pointers and arrays when it makes the sense though. – MykC Sep 21 '10 at 23:12
  • I mentinoed in someone elses comments that it appears that if there was one or more whitespace they would all get stored in a single char so that stopped your above method from working. – MykC Sep 22 '10 at 17:07
0

Here is a solution that uses only the scanf() function. I used sscanf() in this sample for about the same functionality.

#include <stdio.h>


int p_1_cnt = 0, p_2_cnt = 0;

void process_1(int x)
{
    p_1_cnt++;
}


void process_2(int x)
{
    p_2_cnt++;
}


char * input_line = "1 5 3 2  4 6 2  1 9  0";

int main(void)
{
    char * ip = input_line;

    int x = 0, ws_0 = 0, ws_1 = 0, preceding_spaces = 1, fields = -2;

    while (sscanf (ip, "%d%n %n", &x, &ws_0, &ws_1) > 0)
    {
        ip += ws_0;

        if ((preceding_spaces) == 1)
            process_1(x);
        else
            process_2(x);

        preceding_spaces = ws_1 - ws_0;
    }

    printf("\np_1_cnt = %d, p_2_cnt = %d", p_1_cnt, p_2_cnt);
    _fgetchar();

    return 0;
}
Indinfer
  • 532
  • 5
  • 4
0

What is your definition of 'white space'?

Frankly, I don't think I'd want to try using scanf() to identify double white spaces; nearly every other method would be far easier.

However, if you insist on doing the not desperately sensible, then you might want to use code derived from the following:

#include <stdio.h>
#include <string.h>

int main(void)
{
    int d;
    char sp[3] = "";
    int n;

    while ((n = scanf("%d%2[ \t]", &d, sp)) > 0)
    {
        printf("n = %d; d = %d; sp = <<%s>>", n, d, sp);
        if (n == 2 && strlen(sp) == 2)
            printf(" end of group");
        putchar('\n');
    }
    return 0;
}

The square brackets enclose a character class and the 2 before it insists on at most 2 characters from the class. You might have to worry about it reading the newline and trying to get more data to satisfy the character class - which could be resolved by removing the newline from the character class. But then it hinges on your definition of white space, and whether groups are automatically ended by a newline or not. It wouldn't hurt to reset sp[0] = '\0'; at the end of the loop.

You might, perhaps, be better off reversing the fields, to detect two spaces before a number. But that would fail in the ordinary case, so then you'd fall back on a simple "%d" format to read the number (and if that fails, you know you got neither spaces nor a number - error). Note that %d chews up leading white space (as defined by the standard) - all of them.

The more I look at this, the less I like 'scanf() only. Remind me not to take a class at your university, please.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    I believe I only need to concern myself with a whitespace being a single blank character slot or ' '. I'm not attached to scanf, I'm only attached to doing it the easiest way assuming I had to do it again and not just get the job done. Just wanted to see if there was a regex expression or trick with scanf that I may of overlooked that would solve the problem realitively easily since the input is formated. – MykC Sep 21 '10 at 23:06
  • I have been looking at your answer and it seems scanf in your example will always return 2. I am currently looking into what range of values scanf can return and why. – MykC Sep 22 '10 at 00:10
  • @MykC: `scanf()` returns the number of successful conversions. `%2[ \t\n]` requires at least one, and at most 2, white space. Therefore, in your code, it may well always 'work' and you have to look at what is in `char sp[3];` to see what your delimiter is. Actually, on Unix, if someone typed ^D (EOF) at the terminal, your input might see a 'line' without any newline at the end, and hence no white spce, and hence you'd get the return of 1. But, like I said, I do not think I would use `scanf()`; it is just too damn hard to get it to work the way you want. That's why I upvoted @pmg's answer. – Jonathan Leffler Sep 22 '10 at 00:18
  • Yeah, I ended up going with something similar to the above. scanf was also taking one or more whitespaces and storing it in a char, so if ' ' (like 10 whitespaces) occured it was all get put into a single char. Now I am working in Visual Studio 2010 this is what I seeing when I watched the variable as I stepped through. Also the following statement if (' ' == ' ') would return 1. So even if I could get scanf to return a value that I wanted it didn't matter. Like I said above I did drop scanf because obviously it wasn't working and making it work wasn't my goal. – MykC Sep 22 '10 at 17:06
  • Ok, I tried implementing your method again and was able to get a result that would work. So, I'm not sure what I was doing before, but this implementation works. – MykC Sep 22 '10 at 17:30
0

If you really want scanf type functionality, you can use fgets and sscanf, and use the %n specifier to get scanf to give your program the offsets for the beginning and end of each whitespace span at the same time it does the rest of its work.

Otherwise, ditch the whole scanf family. It's quite possibly the most useless part of the standard library, in my opinion.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • It's useful, but generally bad. If you want to add the input equivalent to a debugging print statement to a program, then it's great. If you want to add simple input for a test or demonstration program (where good input practices aren't what you are demoing), then it's pretty good. If you want to do input for production code it's really bad. – nategoose Sep 21 '10 at 23:15
  • Actually there is one use for `scanf`: a portable version of `getline` (or `getdelim`), including clean handling of embedded NUL characters, can be achieved with something like `scanf("%99[^\n]%n", buf, &cnt);` (where 99 is replaced with your buffer size). – R.. GitHub STOP HELPING ICE Sep 22 '10 at 00:36
  • `scanf("%99[^\n]%n", buf, &cnt);` has the problem that it saves nothing into `buf` and `cnt` if the input begins with `'\n'` and leaves that `'\n'` in stdin. This is not like `getline()`. – chux - Reinstate Monica Feb 03 '15 at 19:38
  • @chux: To use it to make something like `getline` you need to detect that case, and keep growing the buffer and repeating the call if it fails to hit a newline. `fscanf` is just an ingredient in making it work, not the whole thing. – R.. GitHub STOP HELPING ICE Feb 03 '15 at 21:16