84

I want to skip the first 17 lines while reading a text file.

Let's say the file looks like:

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
good stuff

I just want the good stuff. What I'm doing is a lot more complicated, but this is the part I'm having trouble with.

Jason Sundram
  • 12,225
  • 19
  • 71
  • 86
O.rka
  • 29,847
  • 68
  • 194
  • 309
  • http://stackoverflow.com/questions/620367/python-how-to-jump-to-a-particular-line-in-a-huge-text-file or http://stackoverflow.com/questions/4796764/read-file-from-line-2-or-skip-header-row etc..? – Ryan Kempt Mar 06 '12 at 05:57

9 Answers9

168

Use a slice, like below:

with open('yourfile.txt') as f:
    lines_after_17 = f.readlines()[17:]

If the file is too big to load in memory:

with open('yourfile.txt') as f:
    for _ in range(17):
        next(f)
    for line in f:
        # do stuff
cs95
  • 379,657
  • 97
  • 704
  • 746
wim
  • 338,267
  • 99
  • 616
  • 750
  • 1
    I use the second solutions to read ten lines at the end of a file with 8 million (8e6) lines and it takes ~22 seconds. Is this still the preferred (=fastest) way for such long files (~250 MB)? – riddleculous Nov 27 '17 at 17:41
  • 1
    I would use `tail` for that. – wim Nov 27 '17 at 17:45
  • @wim: I guess, tail doesn't work on Windows. Furthermore I don't always want to read the last 10 lines. I want to be able to read some lines in the middle. (e.g. if I read 10 lines after ~4e6 lines in the same file it takes still half of that time, ~11 seconds) – riddleculous Nov 27 '17 at 17:51
  • 4
    The thing is, you need to read the entire content before line number ~4e6 in order to know where the line separator bytes are located, otherwise you don't know how many lines you've passed. There's no way to magically jump to a line number. ~250 MB should be OK to read entire file to memory though, that's not particularly big data. – wim Nov 27 '17 at 18:02
  • @riddleculous see https://stackoverflow.com/q/3346430/2491761 for getting last lines – tony_tiger Oct 12 '18 at 02:05
46

Use itertools.islice, starting at index 17. It will automatically skip the 17 first lines.

import itertools
with open('file.txt') as f:
    for line in itertools.islice(f, 17, None):  # start=17, stop=None
        # process lines
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
Ismail Badawi
  • 36,054
  • 7
  • 85
  • 97
  • Is this feasible for large text files that may not fit in the memory? That is, does `itertools.islice` load the entire file into the memory? I couldn't find this in the documentation. – Aditya Harikrish Nov 19 '22 at 20:38
3
for line in dropwhile(isBadLine, lines):
    # process as you see fit

Full demo:

from itertools import *

def isBadLine(line):
    return line=='0'

with open(...) as f:
    for line in dropwhile(isBadLine, f):
        # process as you see fit

Advantages: This is easily extensible to cases where your prefix lines are more complicated than "0" (but not interdependent).

ninjagecko
  • 88,546
  • 24
  • 137
  • 145
3

Here are the timeit results for the top 2 answers. Note that "file.txt" is a text file containing 100,000+ lines of random string with a file size of 1MB+.

Using itertools:

import itertools
from timeit import timeit

timeit("""with open("file.txt", "r") as fo:
    for line in itertools.islice(fo, 90000, None):
        line.strip()""", number=100)

>>> 1.604976346003241

Using two for loops:

from timeit import timeit

timeit("""with open("file.txt", "r") as fo:
    for i in range(90000):
        next(fo)
    for j in fo:
        j.strip()""", number=100)

>>> 2.427317383000627

clearly the itertools method is more efficient when dealing with large files.

willywonka
  • 39
  • 2
1

If you don't want to read the whole file into memory at once, you can use a few tricks:

With next(iterator) you can advance to the next line:

with open("filename.txt") as f:
     next(f)
     next(f)
     next(f)
     for line in f:
         print(f)

Of course, this is slighly ugly, so itertools has a better way of doing this:

from itertools import islice

with open("filename.txt") as f:
    # start at line 17 and never stop (None), until the end
    for line in islice(f, 17, None):
         print(f)
Azsgy
  • 3,139
  • 2
  • 29
  • 40
0

This solution helped me to skip the number of lines specified by the linetostart variable. You get the index (int) and the line (string) if you want to keep track of those too. In your case, you substitute linetostart with 18, or assign 18 to linetostart variable.

f = open("file.txt", 'r')
for i, line in enumerate(f, linetostart):
    #Your code
gsamaras
  • 71,951
  • 46
  • 188
  • 305
Wilder
  • 45
  • 4
0

If it's a table.

pd.read_table("path/to/file", sep="\t", index_col=0, skiprows=17)

O.rka
  • 29,847
  • 68
  • 194
  • 309
-1

You can use a List-Comprehension to make it a one-liner:

[fl.readline() for i in xrange(17)]

More about list comprehension in PEP 202 and in the Python documentation.

Niklas R
  • 16,299
  • 28
  • 108
  • 203
  • 2
    doesn't make much sense to store those lines in a list which will just get garbage collected. – wim Mar 06 '12 at 06:04
  • 1
    @wim: The memory overhead is trivial (and probably unavoidable nomatter which way you do it, since you will need to do O(n) processing of those lines unless you skip to an arbitrary point in the file); I just don't think it's very readable. – ninjagecko May 06 '12 at 23:13
  • 2
    I agree with @wim, if you are throwing away the result, use a loop. The whole point of a list comprehension is that you *meant* to store the list; you can just as easily fit a for loop on one line. – David Jun 19 '14 at 00:41
  • or use a generator in a 0-memory deque. – Jean-François Fabre Apr 14 '18 at 20:50
-1

Here is a method to get lines between two line numbers in a file:

import sys

def file_line(name,start=1,end=sys.maxint):
    lc=0
    with open(s) as f:
        for line in f:
            lc+=1
            if lc>=start and lc<=end:
                yield line


s='/usr/share/dict/words'
l1=list(file_line(s,235880))
l2=list(file_line(s,1,10))
print l1
print l2

Output:

['Zyrian\n', 'Zyryan\n', 'zythem\n', 'Zythia\n', 'zythum\n', 'Zyzomys\n', 'Zyzzogeton\n']
['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', 'Aani\n', 'aardvark\n', 'aardwolf\n', 'Aaron\n']

Just call it with one parameter to get from line n -> EOF

the wolf
  • 34,510
  • 13
  • 53
  • 71