8

To iterate a file by lines, one can do -

for line in f: 

(where f is the file iterator).

I want to iterate the file by blocks delimited by commas, instead of blocks delimited by newlines. I can read all lines and then split the string on commas, but whats the pythonic way to do this?

Terry Jan Reedy
  • 18,414
  • 3
  • 40
  • 52
Illusionist
  • 5,204
  • 11
  • 46
  • 76

3 Answers3

3

Iterate over the split as you go then you don't need to store all the lines:

for line in f: 
    for lines in line.split(","):
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
2

Use string split() method to split content by comma.

e.g.

input_file = "/home/vivek/Desktop/Work/stack_over/href_input.html"
#- Read File content.
with open(input_file, "rb") as fp:
    content_list = fp.read().split(",")

Iterate file line by line and iterate every line by spiting comma

>>> with open(input_file, "rb") as fp:
...    for f in fp:
...       for i in f.split(","):
...            i
Vivek Sable
  • 9,938
  • 3
  • 40
  • 56
  • 1
    This kills the whole point of iterating over the file object. Think about a file which is twice or thrice as big as your system's memory. – thefourtheye Feb 24 '15 at 15:35
  • @thefourtheye Isn't that like the 0.1% case, though? How many times do you think this is going to fail by running out of RAM really? – Two-Bit Alchemist Feb 24 '15 at 15:37
  • @thefourtheye: yes, but we read file line by line then it will not comma separated . – Vivek Sable Feb 24 '15 at 15:39
  • @Two-BitAlchemist Okay, why do we prefer iterating over the file object rather than `fp.read().split("\n")`? – thefourtheye Feb 24 '15 at 15:39
  • @thefourtheye Thought experiment: should we also avoid using `split` here since it returns a list and there might be a file with only one line (no `\n`) that will run the system out of memory? – Two-Bit Alchemist Feb 24 '15 at 15:44
  • @thefourtheye - any idea *how* python implements readline/readlines? I'm trying to track down how it happens to see if a user could monkeypatch it to split on something other than `\n` but I'm having trouble finding the actual Python source code that does it. – dwanderson Mar 25 '15 at 01:52
  • @Two-BitAlchemist - that's not a particularly absurd idea - if you read in a (large) file that didn't have any newlines, perhaps because it was processed by a different architecture (eg whichever one only uses `\r` and your architecture includes `\n`). If the OP is asking if there's a way to split on something other than newlines, it's not particularly helpful/relevant to say "just read in the whole file" since that defeats the purpose of the question. – dwanderson Mar 25 '15 at 01:55
  • @thefourtheye - some follow up: perhaps something like http://stackoverflow.com/a/102202/2272638 would be helpful? Potentially slower, but if you're afraid of memory constraints, that might be acceptable/necessary – dwanderson Mar 25 '15 at 02:02
1

If you really need to scan one giant (e.g. 1TB) single-line file and process items by delimiters, you can read file by blocks, split them, and handle border-effects. Here a generator which may help with it:

def split_file(file, delim, block_size=1024*1024):
    block = True
    last_item = ''
    while block:
        block = file.read(block_size)
        items = block.split(delim)
        for i in xrange(len(items)-1):
            item = items[i]
            if last_item:
                yield last_item + item
                last_item = ''
                continue
            if item:
                yield item
        last_item += items[-1]

You can simply use it like this:

f = open("names.in.txt")
for name in split_file(f, ","):
    print name # process one item there
a5kin
  • 1,335
  • 16
  • 20