13

I usually read files like this in Python:

f = open('filename.txt', 'r')
for x in f:
    doStuff(x)
f.close()

However, this splits the file by newlines. I now have a file which has all of its info in one line (45,000 strings separated by commas). While a file of this size is trivial to read in using something like

f = open('filename.txt', 'r')
doStuff(f.read())
f.close()

I am curious if for a much larger file which is all in one line it would be possible to achieve a similar iteration effect as in the first code snippet but with splitting by comma instead of newline, or by any other character?

vasek1
  • 13,541
  • 11
  • 32
  • 36
  • Possible duplicate of: < http://stackoverflow.com/questions/6284468/change-newline-character-readline-seeks >. Solution via subclassing the `file` object is given there. – ely Apr 17 '12 at 01:41
  • It's generally a good idea to `close()` file objects once you're done with them. – Joel Cornett Apr 17 '12 at 01:52
  • @JoelCornett good point, edited my question – vasek1 Apr 17 '12 at 02:04

2 Answers2

10

The following function is a fairly straightforward way to do what you want:

def file_split(f, delim=',', bufsize=1024):
    prev = ''
    while True:
        s = f.read(bufsize)
        if not s:
            break
        split = s.split(delim)
        if len(split) > 1:
            yield prev + split[0]
            prev = split[-1]
            for x in split[1:-1]:
                yield x
        else:
            prev += s
    if prev:
        yield prev

You would use it like this:

for item in file_split(open('filename.txt')):
    doStuff(item)

This should be faster than the solution that EMS linked, and will save a lot of memory over reading the entire file at once for large files.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
1

Open the file using open(), then use the file.read(x) method to read (approximately) the next x bytes from the file. You could keep requesting blocks of 4096 characters until you hit end-of-file.

You will have to implement the splitting yourself - you can take inspiration from the csv module, but I don't believe you can use it directly because it wasn't designed to deal with extremely long lines.

Li-aung Yip
  • 12,320
  • 5
  • 34
  • 49
  • 1
    You can do this with a file object on either Python 2 or 3. No reason to use `io`. Also, just to be clear, a file object is what you get when you call `open`. Don't use the actual `file` built-in. – agf Apr 17 '12 at 01:17