Preferred way to read the entire file as a list

Question

I just saw a comment from DSM to my answer to the question create a series of tuples using a for loop and forced me to wonder if there is any reason to use fileObj.readlines() over passing the fileObj to a list. Both gives the same result as far as I can see. The only difference is the readability but considering both are equally readable, what should be the preferred way?

Consider the two scenarios

#This will create a tuple of file lines
with open("yourfile") as fin:
    tup = list(fin)

#This is a straight forward way to create a list of file lines
with open("yourfile") as fin:
    tup = fin.readlines()

I tried to timeit, but it does not make much sense as they both have comparable performance.

Neither of your code paths creates a `tuple`. Each alternative creates a `list`. — Robᵩ, Mar 27 '13 at 19:24
BTW, they can both be shortened to `lis = list(open('yourfile'))` and `lis = open('yourfile').readlines()` because the file object goes out of scope and is freed immediately. — tdelaney, Mar 27 '13 at 19:24
@tdelaney - Can you cite documentation for that? I'm always worried that the file object will sit around waiting for garbage collection. — Robᵩ, Mar 27 '13 at 19:25
Not cited either, but more in-depth: http://stackoverflow.com/questions/2404430/does-filehandle-get-closed-automatically-in-python-after-it-goes-out-of-scope#answer-2404671 — mayhewr, Mar 27 '13 at 19:32
@Robᵩ [python 2.7.3 Data Model](http://docs.python.org/2/reference/datamodel.html?highlight=garbage) says: "CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references." - kinda vague, but since there are no circular references to the obj, it is deleted immediately. — tdelaney, Mar 27 '13 at 19:37
@Robᵩ I was a big fan of `with open('xyz') as f:` until people finally convinced me it was pointless. — tdelaney, Mar 27 '13 at 19:39
http://docs.python.org/2/library/stdtypes.html#file.next implies that the iterator version goes faster than `.readlines()`. — Robᵩ, Mar 27 '13 at 19:39
@tdelaney - My bigger problem with your suggestion is that it performs two separate (but admittedly related) actions on the same line: `open` and `read`. — Robᵩ, Mar 27 '13 at 19:41
I think both of them are equivalent if the optional `sizehint` argument is not passed to `readlines()`. — Ashwini Chaudhary, Mar 27 '13 at 19:48
@tdelaney how can you ignore the rest of the quote in the very same box that explicitly says "Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files)." It literally says **always close files**. — mayhewr, Mar 27 '13 at 19:52
@tdelaney Jython and PyPy don't use reference counting. You're relying on CPython implementation details. — Fred Foo, Mar 27 '13 at 19:55

Preferred way to read the entire file as a list

0 Answers0