1

I have multiple files, each with a line with, say ~10M numbers each. I want to check each file and print a 0 for each file that has numbers repeated and 1 for each that doesn't.

I am using a list for counting frequency. Because of the large amount of numbers per line I want to update the frequency after accepting each number and break as soon as I find a repeated number. While this is simple in C, I have no idea how to do this in Python.

How do I input a line in a word-by-word manner without storing (or taking as input) the whole line?

EDIT: I also need a way for doing this from live input rather than a file.

supersonic_ht
  • 354
  • 4
  • 15
  • @Buddy: Tried Googling for inputting word by word as well as byte by byte. Got nothing. – supersonic_ht Jun 11 '16 at 05:26
  • By word, you mean the word datatype, yes? Not like a written word? – OneCricketeer Jun 11 '16 at 05:36
  • 1
    @supersonic_ht http://stackoverflow.com/questions/1035340/reading-binary-file-in-python-and-looping-over-each-byte – Buddy Jun 11 '16 at 05:38
  • I mean a number from a space-separated line of input, the type that you can feed a loop of cin statements accepting integers in C++. Close to a written word but not exactly in this case as you can't do the same for a string. – supersonic_ht Jun 11 '16 at 05:42
  • Thank you for the link, I'll try it. I somehow missed that. – supersonic_ht Jun 11 '16 at 05:43
  • Thank you, Buddy, that was what I needed. Smac89's answer below uses it to solve my problem completely. – supersonic_ht Jun 11 '16 at 06:48
  • 1
    Possible duplicate of [Iterate through words of a file in Python](http://stackoverflow.com/questions/7745260/iterate-through-words-of-a-file-in-python) – smac89 Nov 30 '16 at 01:27

2 Answers2

1

Read the line, split the line, copy the array result into a set. If the size of the set is less than the size of the array, the file contains repeated elements

with open('filename', 'r') as f:
    for line in f:
        # Here is where you do what I said above

To read the file word by word, try this

import itertools

def readWords(file_object):
    word = ""
    for ch in itertools.takewhile(lambda c: bool(c), itertools.imap(file_object.read, itertools.repeat(1))):
        if ch.isspace():
            if word: # In case of multiple spaces
                yield word
                word = ""
            continue
        word += ch
    if word:
        yield word # Handles last word before EOF

Then you can do:

with open('filename', 'r') as f:
    for num in itertools.imap(int, readWords(f)):
        # Store the numbers in a set, and use the set to check if the number already exists

This method should also work for streams because it only reads one byte at a time and outputs a single space delimited string from the input stream.


After giving this answer, I've updated this method quite a bit. Have a look

<script src="https://gist.github.com/smac89/bddb27d975c59a5f053256c893630cdc.js"></script>
smac89
  • 39,374
  • 15
  • 132
  • 179
  • I don't want to read the line at all. That's the point of the question. – supersonic_ht Jun 11 '16 at 05:22
  • This is perfect. Took me a while to wrap my head around the itertools functions. Brilliant! Live input stream also worked when I called the function on sys.stdin. – supersonic_ht Jun 11 '16 at 06:50
  • 1
    Your code doesn't yield the last word before the EOF. Adding a `if word: yield word` at the end fixes that. https://gist.github.com/anonymous/87287531548a49567c2f49192adb5ced – supersonic_ht Jun 11 '16 at 08:54
0

The way you are asking it is not possible I guess. You can't read word by word as such in python . Something of this can be done:

f = open('words.txt')
for word in f.read().split():
    print(word)
Priyansh Goel
  • 2,660
  • 1
  • 13
  • 37
  • I'd be surprised if Python doesn't have a method for doing something which seems so simple. – supersonic_ht Jun 11 '16 at 05:26
  • @supersonic_ht It's simple but a bit low-level. Python is good at abstracting away some low level stuff to make the programmer's life easier, and this is part of what makes it great – smac89 Jun 11 '16 at 05:35