How to read first N lines of a file?

Question

We have a large raw data file that we would like to trim to a specified size.

How would I go about getting the first N lines of a text file in python? Will the OS being used have any effect on the implementation?

can I give n as command line argument – Jul 23 '19 at 09:07 — , Jul 23 '19 at 09:07

score 334 · Accepted Answer · edited May 17 '23 at 05:58

334

Python 3:

with open(path_to_file) as input_file:
    head = [next(input_file) for _ in range(lines_number)]
print(head)

Python 2:

with open(path_to_file) as input_file:
    head = [next(input_file) for _ in xrange(lines_number)]
print head

Here's another way (both Python 2 & 3):

from itertools import islice

with open(path_to_file) as input_file:
    head = list(islice(input_file, lines_number))
print(head)

edited May 17 '23 at 05:58

Community

1
1

answered Nov 20 '09 at 00:27

John La Rooy

295,403
53
369
502

1

Thanks, that is very helpful indeed. What is the difference between the two? (in terms of performance, required libraries, compatibility etc)? – Russell Nov 20 '09 at 00:34
1

I expect the performance to be similar, maybe the first to be slightly faster. But the first one won't work if the file doesn't have at least N lines. You are best to measure the performance against some typical data you will be using it with. – John La Rooy Nov 20 '09 at 00:47
1

The with statement works on Python 2.6, and requires an extra import statement on 2.5. For 2.4 or earlier, you'd need to rewrite the code with a try...except block. Stylistically, I prefer the first option, although as mentioned the second is more robust for short files. – Alasdair Nov 20 '09 at 01:21
1

islice is probably faster as it is implemented in C. – Alice Purcell Nov 20 '09 at 06:45
@chrispy, I just tried it out and the second one was faster for the file that I was using as soon as N grows above 20 or so – John La Rooy Nov 20 '09 at 07:26
34

Have in mind that if the files have less then N lines this will raise StopIteration exception that you must handle – Ilian Iliev Jan 25 '12 at 12:44
why is it the prefered way? Could you explain this in your answer? – qed Jun 01 '14 at 22:19
1

What to do, if there are less than N lines in the file ? – sumanth232 Jan 31 '15 at 14:09
1

@krishna222, The version using `islice` will read the whole file if it has less than `N` lines – John La Rooy Jan 31 '15 at 20:34
Nice solution. I wonder if it's obvious to point out that to have the output in a "proper" format rather than a list it's possible to get that with the statement `for line in head: print line,` instead of `print head` – hellter Aug 10 '16 at 10:01
1

Also if you are using Python 3 you would use range(n) rather than xrange() – Carl Oct 28 '16 at 02:40
it seems like this method combine all columns of the original files, say a `.csv` into a single element of the resultant list, is this true to everyone or just me? How to maintain the original format or let the data was read in DataFrame if it is? – Jia Gao Aug 05 '18 at 00:57
@JasonGoal, This question/answer isn't concerned with the meaning of the data. For a csv file, the process is similar but you would pass the file through a [`csv.reader`](https://docs.python.org/3/library/csv.html) – John La Rooy Aug 05 '18 at 22:48
@JohnLaRooy, thanks for the explanation, just figured out how to do this for `.csv` file. `itertools.islice` is the trick, and if people want to explore header of their `.csv`, `DictReader` comes in handy. – Jia Gao Aug 06 '18 at 00:10
This answer has a VERY BIG drawback, you cannot have files with fewer than N lines, or else it will explode. It should NOT be marked as correct, and all those SO developers who just copy pasted are introducing a bug in their code! – juan Isaza Jul 20 '20 at 22:49
This is the right way, as it consumes the file as a stream until necessary (doesn't read all the lines). – Maciej Skorski Sep 28 '22 at 16:17

score 30 · Answer 2 · edited Apr 08 '20 at 00:16

30

N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
    for i in range(N):
        line = next(file).strip()
        print(line)

edited Apr 08 '20 at 00:16

AMC

2,642
7
13
35

answered Nov 20 '09 at 02:04

ghostdog74

327,991
56
259
343

4

Why open the file in append mode? – AMC Apr 08 '20 at 00:16
@AMC I think it is for not deleting the file, but we should use 'r' here instead. – Ekrem Dinçel Aug 18 '20 at 14:23
1

@Kowalski Append mode is for adding to the file, `r` is indeed the more logical choice, I think. – AMC Aug 19 '20 at 18:35
@ghostdog74, how can read the next N values ? – lena Dec 07 '20 at 10:08

G M · Answer 3 · 2020-04-08T08:46:23.157

28

If you want to read the first lines quickly and you don't care about performance you can use .readlines() which returns list object and then slice the list.

E.g. for the first 5 lines:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

Note: the whole file is read so is not the best from the performance point of view but it is easy to use, fast to write and easy to remember so if you want just perform some one-time calculation is very convenient

print firstNlines

One advantage compared to the other answers is the possibility to select easily the range of lines e.g. skipping the first 10 lines [10:30] or the lasts 10 [:-10] or taking only even lines [::2].

edited Apr 08 '20 at 08:46

answered Dec 07 '13 at 12:59

G M

20,759
10
81
84

3

The top answer is probably way more efficient, but this one works like a charm for small files. – T.Chmelevskij Nov 07 '15 at 12:53
3

Note that this actually reads the whole file into a list first (myfile.readlines()) and then splices the first 5 lines of it. – AbdealiLoKo Oct 25 '16 at 09:07
6

This should be avoided. – anilbey Nov 27 '18 at 22:58
1

I see no reason to use this, it's not any simpler than the vastly more efficient solutions. – AMC Apr 08 '20 at 00:17
1

@AMC thanks for the feedback, I use it in the console for exploring the data when I have to have a quick look to the first lines, it just saves me time in writing code. – G M Apr 08 '20 at 08:27
@GM Do you find it faster to type than the solution using `itertools.islice()` ? Of course, that requires an extra module. – AMC Aug 19 '20 at 18:38
@AMC absolutely with islice you have to import a module and to remember a not very intuitive command, this solution is faster to write and easier to remember the perfect choice for one time operations. – G M Aug 20 '20 at 06:41

score 13 · Answer 4 · edited Aug 19 '20 at 18:39

13

What I do is to call the N lines using pandas. I think the performance is not the best, but for example if N=1000:

import pandas as pd
yourfile = pd.read_csv('path/to/your/file.csv',nrows=1000)

edited Aug 19 '20 at 18:39

AMC

2,642
7
13
35

answered Apr 11 '17 at 14:54

RRuiz

2,159
21
32

4

Better would be to use the `nrows` option, which can be set to 1000 and the entire file isn't loaded. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html In general, pandas has this and other memory-saving techniques for big files. – philshem Apr 11 '17 at 15:03
Yes, you are right. I just correct it. Sorry for the mistake. – RRuiz Apr 11 '17 at 15:06
1

You may also want to add `sep` to define a column delimiter (which shouldn't occur in a non-csv file) – philshem Apr 11 '17 at 15:09
2

@Cro-Magnon I cannot find the `pandas.read()` function in the documentation, do you know of any information on the subject? – AMC Apr 08 '20 at 00:19

score 8 · Answer 5 · edited Nov 20 '09 at 00:58

8

There is no specific method to read number of lines exposed by file object.

I guess the easiest way would be following:

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

edited Nov 20 '09 at 00:58

u0b34a0f6ae

48,117
14
92
101

answered Nov 20 '09 at 00:27

artdanil

4,952
2
32
49

This is something I had actually intended. Though, I though of adding each line to list. Thank you. – artdanil Nov 20 '09 at 02:11

score 6 · Answer 6 · answered Mar 02 '18 at 23:42

The two most intuitive ways of doing this would be:

Iterate on the file line-by-line, and break after N lines.
Iterate on the file line-by-line using the next() method N times. (This is essentially just a different syntax for what the top answer does.)

Here is the code:

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

The bottom line is, as long as you don't use readlines() or enumerateing the whole file into memory, you have plenty of options.

_The bottom line is, as long as you don't use readlines() or enumerateing the whole file into memory, you have plenty of options._ Isn't `enumerate()` lazy? — AMC, Aug 19 '20 at 18:40

score 4 · Answer 7 · answered Jan 20 '11 at 19:42

Based on gnibbler top voted answer (Nov 20 '09 at 0:27): this class add head() and tail() method to file object.

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

Usage:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

score 3 · Answer 8 · answered Oct 28 '16 at 02:36

3

For first 5 lines, simply do:

N=5
with open("data_file", "r") as file:
    for i in range(N):
       print file.next()

answered Oct 28 '16 at 02:36

Surya

11,002
4
57
39

Maxim Plaksin · Answer 9 · 2011-12-07T09:03:39.013

3

most convinient way on my own:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

Solution based on List Comprehension The function open() supports an iteration interface. The enumerate() covers open() and return tuples (index, item), then we check that we're inside an accepted range (if i < LINE_COUNT) and then simply print the result.

Enjoy the Python. ;)

edited Dec 07 '11 at 09:03

answered Dec 07 '11 at 08:26

Maxim Plaksin

31
2

This just seems like a slightly more complex alternative to `[next(file) for _ in range(LINE_COUNT)]`. – AMC Apr 08 '20 at 00:20

score 2 · Answer 10 · answered Nov 20 '09 at 02:00

If you want something that obviously (without looking up esoteric stuff in manuals) works without imports and try/except and works on a fair range of Python 2.x versions (2.2 to 2.6):

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

Alejandro D. Somoza · Answer 11 · 2014-11-25T06:25:07.800

2

If you have a really big file, and assuming you want the output to be a numpy array, using np.genfromtxt will freeze your computer. This is so much better in my experience:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array

edited Nov 25 '14 at 06:25

answered Nov 25 '14 at 05:00

Alejandro D. Somoza

391
1
3
19

_If you have a really big file, and assuming you want the output to be a numpy array_ That's quite a unique set of restrictions, I can't really see any advantages to this over the alternatives. – AMC Apr 08 '20 at 00:22

score 1 · Answer 12 · edited Aug 23 '19 at 21:53

1

This worked for me

f = open("history_export.csv", "r")
line= 5
for x in range(line):
    a = f.readline()
    print(a)

edited Aug 23 '19 at 21:53

Caconde

4,177
7
35
32

answered Aug 23 '19 at 19:18

Sukanta

31
8

Why not use a context manager? In any case, I don't see how this improves on the many existing answers. – AMC Apr 08 '20 at 00:24

Linh K Ha · Answer 13 · 2021-07-10T13:03:24.003

I would like to handle the file with less than n-lines by reading the whole file

def head(filename: str, n: int):
    try:
        with open(filename) as f:
            head_lines = [next(f).rstrip() for x in range(n)]
    except StopIteration:
        with open(filename) as f:
            head_lines = f.read().splitlines()
    return head_lines

Credit go to John La Rooy and Ilian Iliev. Use the function for the best performance with exception handle

Revise 1: Thanks FrankM for the feedback, to handle file existence and read permission we can futher add

import errno
import os

def head(filename: str, n: int):
    if not os.path.isfile(filename):
        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), filename)  
    if not os.access(filename, os.R_OK):
        raise PermissionError(errno.EACCES, os.strerror(errno.EACCES), filename)     
   
    try:
        with open(filename) as f:
            head_lines = [next(f).rstrip() for x in range(n)]
    except StopIteration:
        with open(filename) as f:
            head_lines = f.read().splitlines()
    return head_lines

You can either go with second version or go with the first one and handle the file exception later. The check is quick and mostly free from performance standpoint

Well this isn't soundproof. Meaning if there is an exception, you try to read the file again, which could throw another exception. This works if the file exists and you got the permissions to read. If not it results in an exception. The accepted answer provides (solution 3) a variant which does the same using ```islice``` (reads the whole file, when it has fewer lines). But your solution is better than variant 1 and 2. — FrankM, Jul 09 '21 at 10:09
Thanks @FrankM for the feedback, please see my revise answer — Linh K Ha, Jul 10 '21 at 13:06

score 0 · Answer 14 · answered Dec 06 '12 at 18:02

0

Starting at Python 2.6, you can take advantage of more sophisticated functions in the IO base clase. So the top rated answer above can be rewritten as:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

(You don't have to worry about your file having less than N lines since no StopIteration exception is thrown.)

answered Dec 06 '12 at 18:02

Steve Bading

121
1
1

27

According to the [docs](http://docs.python.org/2/library/stdtypes.html#file.readlines) N is the number of _bytes_ to read, **not** the number of _lines_. – Mark Mikofski Jun 18 '13 at 17:41
4

N is the number of bytes! – qed Jun 01 '14 at 14:19
5

Wow. Talk about poor naming. The function name mentions `lines` but the argument refers to `bytes`. – ArtOfWarfare Apr 27 '15 at 18:22

score 0 · Answer 15 · answered Nov 11 '19 at 23:09

0

This works for Python 2 & 3:

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):
        print(line)

answered Nov 11 '19 at 23:09

sandyp

432
5
14

This is virtually identical to the [decade-old top answer](https://stackoverflow.com/a/1767589/11301900). – AMC Apr 08 '20 at 00:23

score 0 · Answer 16 · answered Apr 23 '20 at 14:44


fname = input("Enter file name: ")
num_lines = 0

with open(fname, 'r') as f: #lines count
    for line in f:
        num_lines += 1

num_lines_input = int (input("Enter line numbers: "))

if num_lines_input <= num_lines:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)

else:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)
        print("Don't have", num_lines_input, " lines print as much as you can")


print("Total lines in the text",num_lines)

score 0 · Answer 17 · answered Nov 20 '21 at 14:50

0

Here's another decent solution with a list comprehension:

file = open('file.txt', 'r')

lines = [next(file) for x in range(3)]  # first 3 lines will be in this list

file.close()

answered Nov 20 '21 at 14:50

Oleksandr Novik

489
9
24

1

How is this answer different from the accepted answer?! – zardosht Mar 15 '22 at 10:52

Gelzone · Answer 18 · 2023-01-06T08:48:46.933

0

An easy way to get first 10 lines:

with open('fileName.txt', mode = 'r') as file:
    list = [line.rstrip('\n') for line in file][:10]
    print(list)

edited Jan 06 '23 at 08:48

answered Jan 06 '23 at 08:46

Gelzone

11
5

score -2 · Answer 19 · edited Jul 12 '17 at 16:58

-2

#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

This Method Worked for me

edited Jul 12 '17 at 16:58

Eric Aya

69,473
35
181
253

answered Jul 12 '17 at 16:25

Mansur Ul Hasan

2,898
27
24

This isn't really a Python solution, though. – AMC Apr 08 '20 at 00:23
I do not even understand what is written in your answer. Please add some explanation. – Nairum Jun 30 '20 at 13:09

score -2 · Answer 20 · answered Oct 04 '21 at 13:23

-2

Simply Convert your CSV file object to a list using list(file_data)

import csv;
with open('your_csv_file.csv') as file_obj:
    file_data = csv.reader(file_obj);
    file_list = list(file_data)
    for row in file_list[:4]:
        print(row)

answered Oct 04 '21 at 13:23

shivam singh

49
5

Will be horribly slow for huge files, since you'll have to load every single line just to get first 4 of them – Oleksandr Novik Nov 20 '21 at 14:52

How to read first N lines of a file?

20 Answers20

Linked

Related