Using Python, how to read a file starting at the seventh line?

Question

I have a text file dnw.txt structured as:

date
downland

user 

date data1 date2
201102 foo bar 200 50
201101 foo bar 300 35

So the first six lines of file are not needed.

I know I can open the file with

f = open('dwn.txt', 'rb')

How do I "split" this file starting at line 7 to EOF?

how would you read a file line by line, in general? does your tutorial explain that? — SilentGhost, Feb 01 '11 at 15:24
possible duplicate of [Read file from line 2 or skip header row](http://stackoverflow.com/questions/4796764/read-file-from-line-2-or-skip-header-row) — SilentGhost, Feb 01 '11 at 15:25
My tutorial. dont have one.... The method I use most often is for line in ???.split("\r\n"): Is this your question? — Merlin, Feb 01 '11 at 15:28

score 41 · Accepted Answer · edited Apr 11 '23 at 11:36

41

with open('dwn.txt') as f:
    for i in range(6):
        next(f)
    for line in f:
        process(line)

(In Python 2, use xrange instead of range, and f.next() instead of next(f).)

edited Apr 11 '23 at 11:36

mkrieger1

19,194
5
54
65

answered Feb 01 '11 at 18:12

John Machin

81,303
11
141
189

2

@user428862: `process(line)` is pseudocode for "insert your own code here to do whatever you want with `line`". What kind of code is "ur" code? – John Machin Feb 01 '11 at 23:06

Josh Lee · Answer 2 · 2011-02-01T21:21:22.197

11

Itertools answer!

from itertools import islice

with open('foo') as f:
    for line in islice(f, 6, None):
        print line

edited Feb 01 '11 at 21:21

answered Feb 01 '11 at 15:32

Josh Lee

171,072
38
269
275

score 6 · Answer 3 · answered Aug 29 '18 at 17:51

6

Python 3:

with open("file.txt","r") as f:
    for i in range(6):
        f.readline()
    for line in f:
        # process lines 7-end

answered Aug 29 '18 at 17:51

KiteCoder

2,364
1
13
29

Basically I think you are pushing the 'cursor' forward 6 times: one for each of "list(range(6)) or [0, 1, 2, 3, 4, 5]". Hence line 7 is next. Then start processing. Clever if I understand correctly. – jouell Sep 07 '19 at 00:12

systempuntoout · Answer 4 · 2011-02-01T18:35:16.803

5

with open('test.txt', 'r') as fo:
   for i in xrange(6):
       fo.next()
   for line in fo:
       print "%s" % line.strip()

edited Feb 01 '11 at 18:35

answered Feb 01 '11 at 15:37

systempuntoout

71,966
47
171
241

eyquem · Answer 5 · 2011-02-02T17:04:04.653

In fact, to answer precisely at the question as it was written

How do I "split" this file starting at line 7 to EOF?

you can do

:

in case the file is not big:

with open('dwn.txt','rb+') as f:
    for i in xrange(6):
        print f.readline()
    content = f.read()
    f.seek(0,0)
    f.write(content)
    f.truncate()

in case the file is very big

with open('dwn.txt','rb+') as ahead, open('dwn.txt','rb+') as back:
    for i in xrange(6):
        print ahead.readline()

    x = 100000
    chunk = ahead.read(x)
    while chunk:
        print repr(chunk)
        back.write(chunk)
        chunk = ahead.read(x)
    back.truncate()

The truncate() function is essential to put the EOF you asked for. Without executing truncate() , the tail of the file, corresponding to the offset of 6 lines, would remain.

.

The file must be opened in binary mode to prevent any problem to happen.

When Python reads '\r\n' , it transforms them in '\n' (that's the Universal Newline Support, enabled by default) , that is to say there are only '\n' in the chains chunk even if there were '\r\n' in the file.

If the file is from Macintosh origin , it contains only CR = '\r' newlines before the treatment but they will be changed to '\n' or '\r\n' (according to the platform) during the rewriting on a non-Macintosh machine.

If it is a file from Linux origin, it contains only LF = '\n' newlines which, on a Windows OS, will be changed to '\r\n' (I don't know for a Linux file processed on a Macintosh ). The reason is that the OS Windows writes '\r\n' whatever it is ordered to write , '\n' or '\r' or '\r\n'. Consequently, there would be more characters rewritten than having been read, and then the offset between the file's pointers ahead and back would diminish and cause a messy rewriting.

In HTML sources , there are also various newlines.

That's why it's always preferable to open files in binary mode when they are so processed.

score 2 · Answer 6 · edited Jun 20 '20 at 09:12

Alternative version

You can direct use the command read() if you know the character position pos of the separating (header part from the part of interest) linebreak, e.g. an \n, in the text at which you want to break your input text:

with open('input.txt', 'r') as txt_in:
    txt_in.seek(pos)
    second_half = txt_in.read()

If you are interested in both halfs, you could also investigate the following method:

with open('input.txt', 'r') as txt_in:
    all_contents = txt_in.read()
first_half = all_contents[:pos]
second_half = all_contents[pos:]

score 0 · Answer 7 · answered Feb 01 '11 at 15:31

0

You can read the entire file into an array/list and then just start at the index appropriate to the line you wish to start reading at.

f = open('dwn.txt', 'rb')
fileAsList = f.readlines()
fileAsList[0] #first line
fileAsList[1] #second line

answered Feb 01 '11 at 15:31

Convolution

2,351
17
24

Cuga · Answer 8 · 2011-02-02T15:13:04.687

0

#!/usr/bin/python

with open('dnw.txt', 'r') as f:
    lines_7_through_end = f.readlines()[6:]

print "Lines 7+:"
i = 7;
for line in lines_7_through_end:
    print "    Line %s: %s" % (i, line)
    i+=1

Prints:

Lines 7+:

  Line 7: 201102 foo bar 200 50

  Line 8: 201101 foo bar 300 35

Edit:

To rebuild dwn.txt without the first six lines, do this after the above code:

with open('dnw.txt', 'w') as f:
    for line in lines_7_through_end:
        f.write(line)

edited Feb 02 '11 at 15:13

answered Feb 01 '11 at 15:39

Cuga

17,668
31
111
166

using with: open('dnw.txt', 'r') as f: lines = f.readlines()[6:] for line in lines: print " %s" % (line) – Merlin Feb 01 '11 at 15:56
that's how everything best about SO is destroyed. – SilentGhost Feb 01 '11 at 16:02
@SG its extra info that will clutter up the database. – Merlin Feb 01 '11 at 16:10
2

From Python 2.6, something maybe more elegant than using a dedicated index: `for (i, line) in enumerate(lines_7_through_end, 7):...` This avoids taking care of incrementing `i`. – Emmanuel Feb 01 '11 at 16:17
There's no need to print *Line 7, Line 8* in my opinion – systempuntoout Feb 01 '11 at 16:38
Line numbers are being printed only for readability of the example. Obviously they're optional. However, the point of this example was to show how the user can store the lines to a list and use them later. – Cuga Feb 01 '11 at 17:23
How would rebuild dwn.txt without the first six lines. – Merlin Feb 02 '11 at 05:09
@user428862: Updated w/ answer – Cuga Feb 02 '11 at 15:13

score 0 · Answer 9 · answered Jan 07 '20 at 13:16

I have created a script used to cut an Apache access.log file several times a day. It's not original topic of question, but I think it can be useful, if you have store the file cursor position after the 6 first lines reading.

So I needed the set a position cursor on last line parsed during last execution. To this end, I used file.seek() and file.seek() methods which allows the storage of the cursor in file.

My code :

ENCODING = "utf8"
CURRENT_FILE_DIR = os.path.dirname(os.path.abspath(__file__))

# This file is used to store the last cursor position
cursor_position = os.path.join(CURRENT_FILE_DIR, "access_cursor_position.log")

# Log file with new lines
log_file_to_cut = os.path.join(CURRENT_FILE_DIR, "access.log")
cut_file = os.path.join(CURRENT_FILE_DIR, "cut_access", "cut.log")

# Set in from_line 
from_position = 0
try:
    with open(cursor_position, "r", encoding=ENCODING) as f:
        from_position = int(f.read())
except Exception as e:
    pass

# We read log_file_to_cut to put new lines in cut_file
with open(log_file_to_cut, "r", encoding=ENCODING) as f:
    with open(cut_file, "w", encoding=ENCODING) as fw:
        # We set cursor to the last position used (during last run of script)
        f.seek(from_position)
        for line in f:
            fw.write("%s" % (line))

    # We save the last position of cursor for next usage
    with open(cursor_position, "w", encoding=ENCODING) as fw:
        fw.write(str(f.tell()))

Spacedman · Answer 10 · 2011-02-01T16:03:25.410

-1

Just do f.readline() six times. Ignore the returned value.

edited Feb 01 '11 at 16:03

answered Feb 01 '11 at 15:26

Spacedman

92,590
12
140
224

did you tried doing it yourself? how on a freaking earth this answer could have two upvotes? are there some evil perl hackers upvoting or something? – SilentGhost Feb 01 '11 at 15:56
I meant f.readline(). .next() is nicer though. You guys win. I lose. – Spacedman Feb 01 '11 at 16:03
Although if you .next() and then try .readline() get a ValueError for mixing iteration and read methods. – Spacedman Feb 01 '11 at 16:06
2

You've downvoted 'readlines()' solutions for valid reasons explained, but why downvote a readline() [times 6] solution? Surely this doesn't read the whole file. Note also my issue with .next() and then .readline(). – Spacedman Feb 01 '11 at 18:22
@Spacedman: because readline() is old hat and because of the very issue that you mention – John Machin Feb 01 '11 at 20:26

eyquem · Answer 11 · 2011-02-01T17:48:18.757

-1

Solutions with readlines() are not satisfactory in my opinion because readlines() reads the entire file. The user will have to read again the lines (in file or in the produced list) to process what he wants, while it could have been done without having read the intersting lines already a first time. Moreover if the file is big, the memory is weighed by the file's content while a for line in file instruction would have been lighter.

Doing repetition of readline() can be done like that

nb = 6
exec( nb * 'f.readline()\n')

It's short piece of code and nb is programmatically adjustable

edited Feb 01 '11 at 17:48

answered Feb 01 '11 at 17:42

eyquem

26,771
7
38
46

are you serious? `exec`. in all fairness! – SilentGhost Feb 01 '11 at 17:49
3

+1 for not reading the whole file into memory, -100 for using `exec` – John Machin Feb 01 '11 at 17:56
What is there against exec() ? It's still in Python 3; if it was as much bad as xreadlines() was, it would have been deprecated the same. I never use exec(), abut it seemed to me that in this case, it could shorten the code instead of writing 6 lines with readline() – eyquem Feb 01 '11 at 18:51
1

« Solutions with readlines() are not satisfactory in my opinion because readlines() reads the entire file. » Well, it can be discussed. It depends of the file and the objective. If a file is big and that only a few lines are interesting, it isn't a good idea to read the entire file before treat it in a re-reading. But if not big and all the lines put in a list simplify the code or whatever else, it could be acceptable. It depends. I am no more in agreement with myself. – eyquem Feb 01 '11 at 19:04

Using Python, how to read a file starting at the seventh line?

11 Answers11

Alternative version

Linked

Related