Skip first couple of lines while reading lines in Python file

Question

I want to skip the first 17 lines while reading a text file.

Let's say the file looks like:

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
good stuff

I just want the good stuff. What I'm doing is a lot more complicated, but this is the part I'm having trouble with.

http://stackoverflow.com/questions/620367/python-how-to-jump-to-a-particular-line-in-a-huge-text-file or http://stackoverflow.com/questions/4796764/read-file-from-line-2-or-skip-header-row etc..? — Ryan Kempt, Mar 06 '12 at 05:57

score 168 · Accepted Answer · edited May 15 '18 at 05:49

168

Use a slice, like below:

with open('yourfile.txt') as f:
    lines_after_17 = f.readlines()[17:]

If the file is too big to load in memory:

with open('yourfile.txt') as f:
    for _ in range(17):
        next(f)
    for line in f:
        # do stuff

edited May 15 '18 at 05:49

cs95

379,657
97
704
746

answered Mar 06 '12 at 05:57

wim

338,267
99
616
750

1

I use the second solutions to read ten lines at the end of a file with 8 million (8e6) lines and it takes ~22 seconds. Is this still the preferred (=fastest) way for such long files (~250 MB)? – riddleculous Nov 27 '17 at 17:41
1

I would use `tail` for that. – wim Nov 27 '17 at 17:45
@wim: I guess, tail doesn't work on Windows. Furthermore I don't always want to read the last 10 lines. I want to be able to read some lines in the middle. (e.g. if I read 10 lines after ~4e6 lines in the same file it takes still half of that time, ~11 seconds) – riddleculous Nov 27 '17 at 17:51
4

The thing is, you need to read the entire content before line number ~4e6 in order to know where the line separator bytes are located, otherwise you don't know how many lines you've passed. There's no way to magically jump to a line number. ~250 MB should be OK to read entire file to memory though, that's not particularly big data. – wim Nov 27 '17 at 18:02
@riddleculous see https://stackoverflow.com/q/3346430/2491761 for getting last lines – tony_tiger Oct 12 '18 at 02:05

score 46 · Answer 2 · edited Apr 14 '18 at 20:47

46

Use itertools.islice, starting at index 17. It will automatically skip the 17 first lines.

import itertools
with open('file.txt') as f:
    for line in itertools.islice(f, 17, None):  # start=17, stop=None
        # process lines

edited Apr 14 '18 at 20:47

Jean-François Fabre

137,073
23
153
219

answered Mar 06 '12 at 06:02

Ismail Badawi

36,054
7
85
97

Is this feasible for large text files that may not fit in the memory? That is, does `itertools.islice` load the entire file into the memory? I couldn't find this in the documentation. – Aditya Harikrish Nov 19 '22 at 20:38

ninjagecko · Answer 3 · 2012-05-06T23:14:33.150

3

for line in dropwhile(isBadLine, lines):
    # process as you see fit

Full demo:

from itertools import *

def isBadLine(line):
    return line=='0'

with open(...) as f:
    for line in dropwhile(isBadLine, f):
        # process as you see fit

Advantages: This is easily extensible to cases where your prefix lines are more complicated than "0" (but not interdependent).

edited May 06 '12 at 23:14

answered May 06 '12 at 23:08

ninjagecko

88,546
24
137
145

score 3 · Answer 4 · answered Dec 27 '18 at 09:37

Here are the timeit results for the top 2 answers. Note that "file.txt" is a text file containing 100,000+ lines of random string with a file size of 1MB+.

Using itertools:

import itertools
from timeit import timeit

timeit("""with open("file.txt", "r") as fo:
    for line in itertools.islice(fo, 90000, None):
        line.strip()""", number=100)

>>> 1.604976346003241

Using two for loops:

from timeit import timeit

timeit("""with open("file.txt", "r") as fo:
    for i in range(90000):
        next(fo)
    for j in fo:
        j.strip()""", number=100)

>>> 2.427317383000627

clearly the itertools method is more efficient when dealing with large files.

score 1 · Answer 5 · answered Apr 14 '18 at 20:45

If you don't want to read the whole file into memory at once, you can use a few tricks:

With next(iterator) you can advance to the next line:

with open("filename.txt") as f:
     next(f)
     next(f)
     next(f)
     for line in f:
         print(f)

Of course, this is slighly ugly, so itertools has a better way of doing this:

from itertools import islice

with open("filename.txt") as f:
    # start at line 17 and never stop (None), until the end
    for line in islice(f, 17, None):
         print(f)

score 0 · Answer 6 · edited Apr 30 '16 at 19:55

0

This solution helped me to skip the number of lines specified by the linetostart variable. You get the index (int) and the line (string) if you want to keep track of those too. In your case, you substitute linetostart with 18, or assign 18 to linetostart variable.

f = open("file.txt", 'r')
for i, line in enumerate(f, linetostart):
    #Your code

edited Apr 30 '16 at 19:55

gsamaras

71,951
46
188
305

answered Jan 19 '16 at 19:25

Wilder

45
4

2

This won’t actually skip lines, it will just offset the enumerate counter. – wim Nov 27 '20 at 18:09

score 0 · Answer 7 · answered Aug 27 '16 at 21:43

0

If it's a table.

pd.read_table("path/to/file", sep="\t", index_col=0, skiprows=17)

answered Aug 27 '16 at 21:43

O.rka

29,847
68
194
309

score -1 · Answer 8 · answered Mar 06 '12 at 05:59

-1

You can use a List-Comprehension to make it a one-liner:

[fl.readline() for i in xrange(17)]

More about list comprehension in PEP 202 and in the Python documentation.

answered Mar 06 '12 at 05:59

Niklas R

16,299
28
108
203

2

doesn't make much sense to store those lines in a list which will just get garbage collected. – wim Mar 06 '12 at 06:04
1

@wim: The memory overhead is trivial (and probably unavoidable nomatter which way you do it, since you will need to do O(n) processing of those lines unless you skip to an arbitrary point in the file); I just don't think it's very readable. – ninjagecko May 06 '12 at 23:13
2

I agree with @wim, if you are throwing away the result, use a loop. The whole point of a list comprehension is that you *meant* to store the list; you can just as easily fit a for loop on one line. – David Jun 19 '14 at 00:41
or use a generator in a 0-memory deque. – Jean-François Fabre Apr 14 '18 at 20:50

score -1 · Answer 9 · answered Mar 06 '12 at 06:42

Here is a method to get lines between two line numbers in a file:

import sys

def file_line(name,start=1,end=sys.maxint):
    lc=0
    with open(s) as f:
        for line in f:
            lc+=1
            if lc>=start and lc<=end:
                yield line


s='/usr/share/dict/words'
l1=list(file_line(s,235880))
l2=list(file_line(s,1,10))
print l1
print l2

Output:

['Zyrian\n', 'Zyryan\n', 'zythem\n', 'Zythia\n', 'zythum\n', 'Zyzomys\n', 'Zyzzogeton\n']
['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', 'Aani\n', 'aardvark\n', 'aardwolf\n', 'Aaron\n']

Just call it with one parameter to get from line n -> EOF

Skip first couple of lines while reading lines in Python file

9 Answers9

Linked

Related