3

I have a one line file that I want to read word by word, i.e., with space separating words. Is there a way to do this without loading the data into the memory and using split? The file is too large.

Roy
  • 837
  • 1
  • 9
  • 22

2 Answers2

2

You can read the file char by char and yield a word after each new white space, below is a simple solution for a file with single white spaces, you should refine it for complex cases (tabs, multiple spaces, etc).

def read_words(filename):
    with open(filename) as f:
        out = ''
        while True:
            c = f.read(1)
            if not c:
                break
            elif c == ' ':
                yield out
                out = ''
            else:
                out += c

Example:

for i in read_words("test"):
    print i 

It uses a generator to avoid have to allocate a big chunk of memory

dlavila
  • 1,204
  • 11
  • 25
0

Try this little function:

def readword(file):
c = ''
word = ''
while c != ' ' and c != '\n':
    word += c
    c = file.read(1)
return word

Then to use it, you can do something like:

f = open('file.ext', 'r')
print(readword(f))

This will read the first word in the file, so if your file is like this:

12 22 word x yy
another word
...

then the output should be 12.

Next time you call this function, it will read the next word, and so on...

Tooniis
  • 123
  • 1
  • 9