8

I feel like almost every time I read a file in Python, what I want is:

with open("filename") as file_handle:
    for line in file_handle:
        #do something

Is this truly the preferred idiom? It mildly irritates me that this double indents all file reading logic. Is there a way to collapse this logic into one line or one layer?

dbn
  • 13,144
  • 3
  • 60
  • 86
  • 1
    Yes, that's the preferred way. There isn't any way to collapse it without losing the `with` statement, as far as I'm aware. – rlms Aug 20 '13 at 19:32
  • You can always explicitly close the file: `fobj = open(...) for line in fobj: #do something fobj.close()`. You might use this in highly indented code. – Bakuriu Aug 20 '13 at 19:34
  • 1
    Use `2` as the tab-depth instead of the [PEP8](http://www.python.org/dev/peps/pep-0008/) `4`. – MattH Aug 20 '13 at 19:35
  • @MattH If you do that consistently, then this still indents the loop body more than a plain `for` loop without `with` statement. If you do it inconsistently... well, it's inconsistent, people will hate you, and it looks awful. Even more awful than consistent 2 space indent. –  Aug 20 '13 at 19:36
  • @delnan: Of course consistently. And yes, 2 spaces wastes far less screen estate than say the body of this code in the question as a method on a class: 16 columns gone for every line in `#do something`. – MattH Aug 20 '13 at 19:39
  • 2
    @MattH By that logic, we could do away with indentation completely. Hopefully you consider that crazy talk, because that logic isn't very convincing. Each indentation level taking a significant amount of space has upsides too, it discourages heavy nesting and/or very long lines (which tends to be hard to understand regardless of how much whitespace is in front of it), and it highlights each level of indentation, which is kinda important because those levels correspond to vital things like control flow constructs. –  Aug 20 '13 at 19:45
  • 3
    The key here is that yes, this is the preferred idiom. And the preferred way to deal with the case where indentation gets too deep is to factor something out into a function. So, even if you _do_ come up with a better solution here that nobody's ever thought of, it still won't be the idiomatic one. – abarnert Aug 20 '13 at 19:48
  • @delnan: _By that logic, we could do away with indentation completely._ No, that's completely disingenuous and not the logical conclusion of my argument. – MattH Aug 20 '13 at 19:53

5 Answers5

3

For simple cases, yes, the two-level with and for is idiomatic.

For cases where the indentation becomes a problem, here as anywhere else in Python, the idiomatic solution is to find something to factor out into a function.


You can write wrappers to help this. For example, here's a simple way to solve some of the problems you use with for (e.g., even in the best case, the file sticks around after you finish the loop, until the end of the scope—which could be days later, or never, if the scope is a main event loop or a generator or something…):

def with_iter(iterable):
    with iterable:
        yield from iterable

for line in with_iter(open("filename")):
    # do something

for line in with_iter(open("other_filename")):
    # do something else

Of course it doesn't solve everything. (See this ActiveState recipe for more details.)

If you know that it does what you want, great. If you don't understand the differences… stick to what's idiomatic; it's idiomatic for a reason.


So, how do you refactor the code? The simplest way is often to turn the loop body into a function, so you can just use map or a comprehension:

def do_with_line(line):
    return line

with open("filename") as f:
    process = [do_with_line(line) for line in f]

But if the problem is that the code above or underneath the for is too deep, you'll have to refactor at a different level.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 2
    Except that this wrapper has the same problems as just using `open()` directly. Resource management becomes non-deterministic, only now it depends on the generator being finalized rather than the file being finalized (notably, an exception in the loop body does *not* trigger the file's `__exit__`). –  Aug 20 '13 at 19:40
  • @delnan: No it doesn't. For example, with just `open` directly, after `for line in file` is done, the file is still open; with `with_iter`, it isn't. See the ActiveState recipe, and the long discussion on python-ideas that led to it, because there's way too much to discuss here. – abarnert Aug 20 '13 at 19:44
  • Granted, it has *some* of the same problems, solves others, and introduces a few new ones. The `send_async` use case is interesting, but not applicable here as the file is consumed eagerly. But as I said, this doesn't ensure prompt finalization if the loop is ended prematurely (exception, `break`, `return`). And if it's an exception and the code catching it is careless (IIRC by storing the traceback in a local variable), it creates a reference cycle which keeps the generator alive at least until the next cycle collection, or *forever* in CPython without PEP 442 (which is only added in v3.4). –  Aug 20 '13 at 19:53
  • @delnan: Even if we had all of the same people here, and even if the record were not already available online, SO comments are a horrible format for a long discussion on this, so I'm not going to rehash the entire thing here. – abarnert Aug 20 '13 at 19:59
  • @delnan: I've rewritten the answer to make it more clear that it's just an example of the kinds of things you can do to help in some cases. The main point of the answer is, still, the first two sentences. – abarnert Aug 20 '13 at 20:04
  • Sure, my concerns don't invalidate this answer. But now you got me all fired up and I want to know if I'm wrong. I couldn't find any discussion of the issue I have claimed exists in the ActiveState recipe, and I couldn't find the python-ideas discussion you mentioned in my inbox (subscribed since 2012-09-30). Do you have a archiv elink or thread title for the latter at hand? –  Aug 20 '13 at 20:09
  • @delnan: I'll search later when I get a chance. There are actually two different threads, one something about smarter context managers (a topic that comes up every three months and means something different each time), one about with clauses in comprehensions, if that helps. Briefly, if you keep the wrapper alive _and_ fail to consume it as an iterator, it will keep the inner object alive, so most of the usual ways to keep something alive forever will work (or, rather, anti-work?) here too. It's mainly about shifting the lifecycle into something with a separate scope that you can pass around. – abarnert Aug 20 '13 at 20:18
  • I think you should point out that the ability to `yield from iterable` wasn't added until Python 3.3 as mentioned in its [_What’s New In Python 3.3_](https://docs.python.org/3.3/whatsnew/3.3.html#pep-380-syntax-for-delegating-to-a-subgenerator) section. – martineau Oct 23 '14 at 06:34
2

Yes, this is absolutely idiomatic Python.

You shouldn't be bothered too much by multiple levels of indentation. Certainly this is not the only way for it to happen, e.g.

if condition:
    for x in sequence:
        #do something with x

If the level of indentation becomes too great, it's time to refactor into multiple functions. One of the things I love most about Python is that it reduces the friction of breaking things up.

with open("filename") as file_handle:
    result = do_something(file_handle)
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
1

In short, no, if you want to maintain the exactly same semantics.

Benjamin Peterson
  • 19,297
  • 6
  • 32
  • 39
  • 1
    It's probably good to spell out what those semantics are and why they're the ones we want. –  Aug 20 '13 at 19:38
0

If single indent would irritate you less you can always do:

with open("filename") as file_handle:
    fle = file_handle.read()

But be careful with big files as after slurping whole file it gets into your machine's memory. You can achieve single indent and still be able to iterate kind of line by line if you do:

with open("filename") as file_handle:
    fle = file_handle.readlines()

Lines from your file will be placed in list, each in separate element, and you can then iterate through it like that:

for ln in fle:
    #do something with ln here, it contain one line from your file

Still be careful with big files! As it is all done in memory.

Lukasz
  • 426
  • 2
  • 13
0

Just to be explicit:

@ myself, of course that's the idiom! The with/for line in idiom provides several benefits:

  • It automatically closes files on errors.
  • It reads in files chunk by chunk, limiting memory use.
  • It is broadly used; other coders will understand it immediately.
dbn
  • 13,144
  • 3
  • 60
  • 86