Selecting and printing specific rows of text file

Question

I have a very large (~8 gb) text file that has very long lines. I would like to pull out lines in selected ranges of this file and put them in another text file. In fact my question is very similar to this and this but I keep getting stuck when I try to select a range of lines instead of a single line.

So far this is the only approach I have gotten to work:

lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))

However this gives me a list and I would like to output a file with a format identical to the input file (one line per row)

Bob · Accepted Answer · 2010-08-24T17:22:34.997

4

You can call join on the ranges.

lines = readin.readlines()
out1.write(''.join(lines[5:67]))
out2.write(''.join(lines[89:111]))

edited Aug 24 '10 at 17:22

answered Aug 24 '10 at 17:10

Bob

3,301
1
16
11

1

It should be out1.write(''.join(lines[5:67])) and same for out2 because readlines doesn't remove end of lines – Xavier Combelle Aug 24 '10 at 17:16
1

This should not be the accepted answer - you don't just read 8 GB into memory unless you have a very good reason to do it. – Steinar Lima Jan 15 '15 at 02:36

aeroNotAuto · Answer 2 · 2010-08-24T17:22:40.250

might i suggest not storing the entire file (since it is large) as per one of your links?

f = open('file')
n = open('newfile', 'w')
for i, text in enumerate(f):
    if i > 4 and i < 68:
        n.write(text)
    elif i > 88 and i < 112:
        n.write(text)
    else:
        pass

i'd also recommend using 'with' instead of opening and closing the file, but i unfortunately am not allowed to upgrade to a new enough version of python for that here : (.

score 1 · Answer 3 · answered Jan 15 '15 at 02:30

The first thing you should think of when facing a problem like this, is to avoid reading the entire file into memory at once. readlines() will do that, so that specific method should be avoided.

Luckily, we have an excellent standard library in Python, itertools. itertools has lot of useful functions, and one of them is islice. islice iterates over an iterable (such as lists, generators, file-like objects etc.) and returns a generator containing the range specified:

itertools.islice(iterable, start, stop[, step])

Make an iterator that returns selected elements from the iterable. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position. Unlike regular slicing, islice() does not support negative values for start, stop, or step. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line)

Using this information, together with the str.join method, you can e.g. extract lines 10-19 by using this simple code:

from itertools import islice

# Add the 'wb' flag if you use Windows
with open('huge_data_file.txt', 'wb') as data_file: 
    txt = '\n'.join(islice(data_file, 10, 20))

Note that when looping over the file object, the newline char is stripped from the lines, so you need to set \n as the joining char.

score 0 · Answer 4 · answered Aug 24 '10 at 17:09

0

(Partial Answer) In order to make your current approach work you'll have to write line by line. For instance:

lines = readin.readlines()

for each in lines[5:67]:
    out1.write(each)

for each in lines[89:111]:
    out2.write(each)

answered Aug 24 '10 at 17:09

Manoj Govindan

72,339
21
134
141

score 0 · Answer 5 · answered Aug 24 '10 at 17:17

path = "c:\\someplace\\"

Open 2 text files. One for reading and one for writing

f_in = open(path + "temp.txt", 'r')
f_out = open(path + output_name, 'w')

go through each line of the input file

for line in f_in:
    if i_want_to_write_this_line == True:
        f_out.write(line)

close the files when done

f_in.close()
f_out.close()

Selecting and printing specific rows of text file

5 Answers5