0

EDIT: I ended up looking up the solutions to the question I am the 'duplicate' of, and crafting the following snippet of code:

import sys
from collections import deque

_input=deque([])

def read_word():
    global _input
    while len(_input)==0:
        line=sys.stdin.readline().split()
        for word in line:
            _input.append(word)
    return _input.popleft()

def read_int():
    return int(read_word())

ORIGINAL QUESTION:

I want to write some programs reading input from stdin in Python. But the input processing is killing me - I don't see a nice way to scan data that is not well-aligned in lines. For example, if I have to scan an integer N, and then N words to process, in C++ I would do it like this:

int N;
char str[100];
scanf("%d",&N);
for(int i=0;i<N;i++){
    scanf("%s",str);
    //do stuff with str
}

This program would scan all of the following input in exactly the same way:

//input 1:
3 word1 word2 word3
//input 2:
3
word1 word2 word3
//input 3:
3
word1
word2
word3
//input 4:
3 word1 word2
word3

I like the flexibility C-like scanf gives to me, and I'd like to see something like that in Python too. I cannot use input(), because it would try to parse strings as Python syntax (possibly ending up with dictionary or something like that), and raw_input() operates on newline-ended lines. Function sys.stdin.read() is not good too, because it waits until the whole input is given (and I may want to display partial results in real time). The only way I see to implement such functionability would be to use sys.stdin.readline() in a loop, trying to parse each line independently until all words needed are parsed. But this is not very elegant solution, and it has some flaws as well - for example, if the example program mentioned above should input one more thing in its further action, and this word or number were given in the same line as last word, the Python program would not parse it properly.

Problematic input would be:

//input 5:
3 word1 word2
word3 next_input

The word "next_input" would be "swallowed" by the readline() function, even though it might be needed by a program later. Again, this can be worked-around by supplying a temporary buffer of input_yet_to_be_parsed_but_already_inputted, but it would soon become very buggy code. Is there a truly 'pythonic' way to do this? Or am I missing something obvious?

akrasuski1
  • 820
  • 1
  • 8
  • 25

1 Answers1

1

Why not use the re module for regular expressions?

Your example would be mapped to a regex like this:

>>> pattern = re.compile(r'^(\d)*')
>>> pattern.match('3 word1').group(1)
'3'

If you need the rest of the data for later use, how about catching it as the second gorup?

>>> pattern = re.compile(r'^(\d)*(.*)')
>>> match = pattern.match('3 lol')
>>> "matched: " + match.group(1) + ", rest: " + match.group(2)
'matched: 3, rest:  lol'
Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88
  • 1
    Ok, but how do you get the input from stdin first? I'd like you to address the final issue specifically (input 5) – akrasuski1 Dec 30 '14 at 12:12
  • @akrasuski1 , see edits, regular expressions may become more complex as the input gets more copmlex, but I remember `scanf` has that property as well. You can access the "rest" of the data by matching it last. – Reut Sharabani Dec 30 '14 at 12:21
  • `pattern.match('3 word1').group(1)` returns the string `'3'` which is hardly the same as `scanf("%d",&N);` that reads from stdin and stores an integer value into the integer variable `N`. – martineau Dec 30 '14 at 12:27
  • While technically it could be implemented, it would easily lead to non-readable code - what if I have functions that read input themeselves? Do I have to pass the temporary buffer to them as parameter? And do I have to return the updated temporary buffer to caller? Thanks for reply, but I don't think this is an acceptable solution. @martineau - this is not a big problem, as I can always make a cast by typing int(x). The problem is in dividing the input into words, if you don't know their distribution amongst lines – akrasuski1 Dec 30 '14 at 12:29