Best method for reading newline delimited files and discarding the newlines?

Question

I am trying to determine the best way to handle getting rid of newlines when reading in newline delimited files in Python.

What I've come up with is the following code, include throwaway code to test.

import os

def getfile(filename,results):
   f = open(filename)
   filecontents = f.readlines()
   for line in filecontents:
     foo = line.strip('\n')
     results.append(foo)
   return results

blahblah = []

getfile('/tmp/foo',blahblah)

for x in blahblah:
    print x

Same as: http://stackoverflow.com/questions/339537/end-line-characters-from-lines-read-from-text-file-using-python — Vijay Dev, Feb 13 '09 at 08:38
Does this answer your question? [How to read a file without newlines?](https://stackoverflow.com/questions/12330522/how-to-read-a-file-without-newlines) — mkrieger1, Jul 06 '21 at 14:42

score 207 · Accepted Answer · edited Jan 16 '13 at 00:03

207

lines = open(filename).read().splitlines()

edited Jan 16 '13 at 00:03

tckmn

57,719
27
114
156

answered Feb 13 '09 at 06:35

Curt Hagenlocher

20,680
8
60
50

1

This answer does what I was going for, I'm sure I'll need to add some error checking and such, but for this specific need, it's great. Thank you all for providing answers! – solarce Feb 13 '09 at 06:48
I like this but how do you close the file if you don't save off the file handle? Or is it automatically closed? – I. J. Kennedy May 12 '12 at 14:22
6

With CPython, the reference count for the file object will go to zero once it's no longer in use and the file will automatically be closed. For purely GC'd implementations like Jython and IronPython, the file may not be closed until the GC runs -- so this terse variation may not be optimal. – Curt Hagenlocher May 13 '12 at 04:13
3

On Mac OS X 10.7.5 with 8GB RAM, I can read file of up to 2047MB (my definition: 1 MB = 1024 x 1024 bytes). 2048MB will throw MemoryError exception. – Hai Vu Apr 25 '13 at 15:47
1

@WKPlus Excellent question -- the answer is "it depends" http://stackoverflow.com/a/15099341/994153 (CPython will close it since the reference count drops to zero, but other Python implementations might not close it, so best to make it explicit) – Colin D Bennett Aug 20 '15 at 18:31
so it will be used as like this? `with open(file) as opened_file: lists = opened_file.read().splitlines()` – Yuda Prawira Apr 17 '17 at 07:29

score 24 · Answer 2 · answered Feb 13 '09 at 08:35

24

Here's a generator that does what you requested. In this case, using rstrip is sufficient and slightly faster than strip.

lines = (line.rstrip('\n') for line in open(filename))

However, you'll most likely want to use this to get rid of trailing whitespaces too.

lines = (line.rstrip() for line in open(filename))

answered Feb 13 '09 at 08:35

TimoLinna

241
1
2

Shouldn't it be [] around the RHS, not ()? – andrewb Aug 10 '13 at 23:52
8

@andrewb Using () gives a generator expression, which doesn't use as much memory as using [] (a list comprehension.) – Jonathan Hartley Aug 28 '13 at 12:40

Paweł Prażak · Answer 3 · 2013-09-21T10:48:51.787

13

What do you think about this approach?

with open(filename) as data:
    datalines = (line.rstrip('\r\n') for line in data)
    for line in datalines:
        ...do something awesome...

Generator expression avoids loading whole file into memory and with ensures closing the file

edited Sep 21 '13 at 10:48

answered Aug 08 '11 at 07:26

Paweł Prażak

3,091
1
27
42

This is essentially the same as @TimoLinna's [answer](https://stackoverflow.com/a/545188/355230) posted years beforehand... – martineau Nov 11 '18 at 18:23

score 8 · Answer 4 · answered Feb 13 '09 at 06:36

8

for line in file('/tmp/foo'):
    print line.strip('\n')

answered Feb 13 '09 at 06:36

David Z

128,184
27
255
279

score 4 · Answer 5 · answered Feb 14 '09 at 07:43

Just use generator expressions:

blahblah = (l.rstrip() for l in open(filename))
for x in blahblah:
    print x

Also I want to advise you against reading whole file in memory -- looping over generators is much more efficient on big datasets.

score 3 · Answer 6 · answered Feb 13 '09 at 11:07

I use this

def cleaned( aFile ):
    for line in aFile:
        yield line.strip()

Then I can do things like this.

lines = list( cleaned( open("file","r") ) )

Or, I can extend cleaned with extra functions to, for example, drop blank lines or skip comment lines or whatever.

score 2 · Answer 7 · edited Feb 13 '09 at 11:06

2

I'd do it like this:

f = open('test.txt')
l = [l for l in f.readlines() if l.strip()]
f.close()
print l

edited Feb 13 '09 at 11:06

S.Lott

384,516
81
508
779

answered Feb 13 '09 at 06:43

While Curt Hagenlocher's answer is technically better, this answer is a good starting point if you need to add other processing to each line. – TomOnTime Dec 31 '10 at 15:56
Not sure if it was intended to filter blank lines, but this is more concise than `... if l.strip() is not ''`, which is what I need in my case. – Zach Young Nov 06 '12 at 21:41

Best method for reading newline delimited files and discarding the newlines?

7 Answers7

Linked

Related