How to read specific lines from a file (by line number)?

Question

I'm using a for loop to read a file, but I only want to read specific lines, say line #26 and #30. Is there any built-in feature to achieve this?

Possible dup: http://stackoverflow.com/questions/620367/python-how-to-jump-to-a-particular-line-in-a-huge-text-file — Adam Matan, Jan 17 '10 at 17:29

Alok Singhal · Accepted Answer · 2016-04-02T19:00:59.330

331

If the file to read is big, and you don't want to read the whole file in memory at once:

fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()

Note that i == n-1 for the nth line.

In Python 2.6 or later:

with open("file") as fp:
    for i, line in enumerate(fp):
        if i == 25:
            # 26th line
        elif i == 29:
            # 30th line
        elif i > 29:
            break

edited Apr 02 '16 at 19:00

answered Jan 17 '10 at 17:23

Alok Singhal

93,253
21
125
158

3

+1 Better solution than mine if the entire file isn't loaded into memory as in `linecache`. Are you sure that `enumerate(fp)` doesn't do that? – Adam Matan Jan 17 '10 at 17:37
17

`enumerate(x)` uses `x.next`, so it doesn't need the entire file in memory. – Alok Singhal Jan 17 '10 at 17:46
3

My small beef with this is that A) You want to use with instead of the open / close pair and thus keep the body short, B) But the body is not that short. Sounds like a trade-off between speed/space and being Pythonic. I am not sure what the best solution would be. – Hamish Grubijan Jan 17 '10 at 17:53
Great. I like SO for that precise reason too. I'll add a link to your answer into mine. – Adam Matan Jan 17 '10 at 17:53
7

with is overrated, python got along fine for over 13 years without it – Dan D. Sep 10 '10 at 04:19
Since this is from '10 and IDK when `io` was added, can someone comment as to whether one should simple use the above for reading in a stream v `io.open`? – kuanb Nov 20 '15 at 15:51
1

@kuanb `io.open` is an alias for `open`. `open()` will work without any issues. – Alok Singhal Nov 21 '15 at 03:39
68

@Dan D. Electricity is overrated, mankind got along fine for over 200 thousand years without it. ;-) 'with' is making it more secure, more readable, and one line shorter. – Romain Vincent Aug 20 '17 at 17:01
Is the same thing applicable for `python3.x` also? can you update? – Trect Dec 07 '19 at 14:31

score 200 · Answer 2 · edited May 23 '17 at 11:47

200

The quick answer:

f=open('filename')
lines=f.readlines()
print lines[25]
print lines[29]

or:

lines=[25, 29]
i=0
f=open('filename')
for line in f:
    if i in lines:
        print i
    i+=1

There is a more elegant solution for extracting many lines: linecache (courtesy of "python: how to jump to a particular line in a huge text file?", a previous stackoverflow.com question).

Quoting the python documentation linked above:

>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'

Change the 4 to your desired line number, and you're on. Note that 4 would bring the fifth line as the count is zero-based.

If the file might be very large, and cause problems when read into memory, it might be a good idea to take @Alok's advice and use enumerate().

To Conclude:

Use fileobject.readlines() or for line in fileobject as a quick solution for small files.
Use linecache for a more elegant solution, which will be quite fast for reading many files, possible repeatedly.
Take @Alok's advice and use enumerate() for files which could be very large, and won't fit into memory. Note that using this method might slow because the file is read sequentially.

edited May 23 '17 at 11:47

Community

1
1

answered Jan 17 '10 at 17:18

Adam Matan

128,757
147
397
562

7

Nice. I just looked at the source of `linecache` module, and looks like it reads the whole file in memory. So, if random access is more important than size optimization, `linecache` is the best method. – Alok Singhal Jan 17 '10 at 17:36
7

with linecache.getlin('some_file', 4) I get the 4th line, not the 5th. – Juan Dec 11 '14 at 18:14
fun fact: if you use a set instead of the list in the second example, you get O(1) running time. Look up in a list is O(n). Internally sets are represented as hashes, and thats why you get the O(1) running time. not a big deal in this example, but If using a large list of numbers, and care about efficiency, then sets are the way to go. – rady Jun 24 '15 at 17:26
`linecache` now appears to only work for python source files – Paul H Oct 11 '17 at 17:32
You can also use `linecache.getlines('/etc/passwd')[0:4]` to read in the first, second, third and fourth lines. – zyy Dec 04 '19 at 20:15

score 41 · Answer 3 · edited Aug 18 '14 at 15:38

41

For the sake of offering another solution:

import linecache
linecache.getline('Sample.txt', Number_of_Line)

I hope this is quick and easy :)

edited Aug 18 '14 at 15:38

Michał

2,456
4
26
33

answered May 08 '13 at 03:38

KingMak

1,378
2
12
26

1

Hope this is most optimal solution . – maniac_user Oct 14 '13 at 21:50
7

This reads the whole file into memory. You might as well call file.read().split('\n') then use array index lookups to get the line of interest... – duhaime Jun 04 '18 at 14:28
Could you provide an example @duhaime – anon Jul 04 '18 at 17:12
1

@anon `''.join(file.readlines()).split('\n'))[5:10]` gives you row 6 to 10 for example. Not recommended, as it reads the whole file into memory. – questionto42 Nov 30 '20 at 19:39
Here is an example and it worked to me: def get_version(): versionLine = linecache.getline('config.php', 4) version = versionLine[19:24] return version – colidom Oct 04 '21 at 16:40

score 37 · Answer 4 · answered Jan 17 '10 at 18:42

37

A fast and compact approach could be:

def picklines(thefile, whatlines):
  return [x for i, x in enumerate(thefile) if i in whatlines]

this accepts any open file-like object thefile (leaving up to the caller whether it should be opened from a disk file, or via e.g a socket, or other file-like stream) and a set of zero-based line indices whatlines, and returns a list, with low memory footprint and reasonable speed. If the number of lines to be returned is huge, you might prefer a generator:

def yieldlines(thefile, whatlines):
  return (x for i, x in enumerate(thefile) if i in whatlines)

which is basically only good for looping upon -- note that the only difference comes from using rounded rather than square parentheses in the return statement, making a list comprehension and a generator expression respectively.

Further note that despite the mention of "lines" and "file" these functions are much, much more general -- they'll work on any iterable, be it an open file or any other, returning a list (or generator) of items based on their progressive item-numbers. So, I'd suggest using more appropriately general names;-).

answered Jan 17 '10 at 18:42

Alex Martelli

854,459
170
1,222
1,395

@ephemient, I disagree -- the genexp reads smoothly and perfectly. – Alex Martelli Jan 18 '10 at 06:00
Excellent and elegant solution, thanks! Indeed, even large files should be supported, with the generator expression. Can't get more elegant than this, can it? :) – Samuel Lampa Sep 11 '14 at 14:37
Nice solution, how does this compare to the one proposed by @AdamMatan? The Adam solution could be faster as it exploits additional information (line numbers monotonically increasing) which could lead to an early stop. I have a 10GB file which I cannot load into memory. – Mannaggia Nov 25 '14 at 11:36
2

@Mannaggia It's not emphasized enough in this answer, but `whatlines` should be a `set`, because `if i in whatlines` will execute faster with a set rather than a (sorted) list. I didn't notice it first and instead devised my own ugly solution with sorted list (where I didn't have to scan a list each time, while `if i in whatlines` does just that), but difference in performance was negligible (with my data) and this solution is much more elegant. – Victor K Apr 28 '15 at 15:58

newtover · Answer 5 · 2014-11-24T15:55:52.347

For the sake of completeness, here is one more option.

Let's start with a definition from python docs:

slice An object usually containing a portion of a sequence. A slice is created using the subscript notation, [] with colons between numbers when several are given, such as in variable_name[1:3:5]. The bracket (subscript) notation uses slice objects internally (or in older versions, __getslice__() and __setslice__()).

Though the slice notation is not directly applicable to iterators in general, the itertools package contains a replacement function:

from itertools import islice

# print the 100th line
with open('the_file') as lines:
    for line in islice(lines, 99, 100):
        print line

# print each third line until 100
with open('the_file') as lines:
    for line in islice(lines, 0, 100, 3):
        print line

The additional advantage of the function is that it does not read the iterator until the end. So you can do more complex things:

with open('the_file') as lines:
    # print the first 100 lines
    for line in islice(lines, 100):
        print line

    # then skip the next 5
    for line in islice(lines, 5):
        pass

    # print the rest
    for line in lines:
        print line

And to answer the original question:

# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]

By far the best approach when working with large files. My program went from consuming 8GB+ to almost nothing. The tradoff was CPU usage which went from ~15% to ~40% but the actual processing of the file was 70% faster. I'll take that tradoff all day long. Thanks you! — GollyJer, Sep 26 '18 at 15:02

score 15 · Answer 6 · answered Oct 21 '10 at 17:07

15

if you want line 7

line = open("file.txt", "r").readlines()[7]

answered Oct 21 '10 at 17:07

MadSc13ntist

431
1
5
8

15

Neat. But how do you `close()` the file when opening it this way? – Milo Wielondek Oct 27 '11 at 13:32
1

@0sh do we need to close? – Ooker Aug 16 '15 at 14:31
4

yes. we need to close after this.When we open a file using "with" ... it closes itself. – reetesh11 Jan 25 '17 at 16:34
1

`with open("file.txt", "r") as file:` `line = file.readlines()[7]`. But mind that this reads the whole file into memory. – questionto42 Nov 30 '20 at 18:19

Martin Thoma · Answer 7 · 2018-09-30T15:30:04.283

13

Reading files is incredible fast. Reading a 100MB file takes less than 0.1 seconds (see my article Reading and Writing Files with Python). Hence you should read it completely and then work with the single lines.

What most answer here do is not wrong, but bad style. Opening files should always be done with with as it makes sure that the file is closed again.

So you should do it like this:

with open("path/to/file.txt") as f:
    lines = f.readlines()
print(lines[26])  # or whatever you want to do with this line
print(lines[30])  # or whatever you want to do with this line

Huge files

If you happen to have a huge file and memory consumption is a concern, you can process it line by line:

with open("path/to/file.txt") as f:
    for i, line in enumerate(f):
        pass  # process line i

edited Sep 30 '18 at 15:30

answered Mar 23 '15 at 20:41

Martin Thoma

124,992
159
614
958

1

IMO it is a really bad style to read an entire file of unknown lenght, just only to get the first 30 lines .. what is about memory consumption .. and what is about endless streams? – return42 Sep 30 '18 at 14:51
@return42 It depends very much on the application. For many, it is totally fine to assume that a text file has a way lower size than the available memory. If you happen to have potentially huge files, I've edited my answer. – Martin Thoma Sep 30 '18 at 15:31
thanks for your addition, which is the same as alok [answer](https://stackoverflow.com/a/2081880/300130). And sorry no, I don't think this depends on the application. IMO it is always better not read more lines then you need. – return42 Oct 04 '18 at 12:31
4

"Reading files is incredibly fast" I take issue with this. Reading files is, in fact, extremely slow, and data intensive programs will go out of their way to do it as little as possible. 0.1 seconds is nowhere near "fast" in computing terms. If you're only doing it once maybe it's okay (in some cases), but if you do that 1000 times it will take 100 seconds and that is nowhere near acceptable in most cases. – Michael Dorst Aug 12 '20 at 18:21
1

@michael dorst: I completely agree. It depends on your application, but we need to consider that he has to read the file anyway. The question is: what is the speed difference between reading line 26 and 30 only and reading a file with e.g. 500 lines. I'm assuming it's not way more, because that I would have expected to be mentioned. – Martin Thoma Aug 13 '20 at 07:45

score 10 · Answer 8 · answered Jul 03 '18 at 16:15

Some of these are lovely, but it can be done much more simply:

start = 0 # some starting index
end = 5000 # some ending index
filename = 'test.txt' # some file we want to use

with open(filename) as fh:
    data = fin.readlines()[start:end]

print(data)

That will use simply list slicing, it loads the whole file, but most systems will minimise memory usage appropriately, it's faster than most of the methods given above, and works on my 10G+ data files. Good luck!

Michael Dorner · Answer 9 · 2020-04-21T10:51:47.633

7

If your large text file file is strictly well-structured (meaning every line has the same length l), you could use for n-th line

with open(file) as f:
    f.seek(n*l)
    line = f.readline() 
    last_pos = f.tell()

Disclaimer This does only work for files with the same length!

edited Apr 21 '20 at 10:51

answered Sep 30 '18 at 14:55

Michael Dorner

17,587
13
87
117

score 5 · Answer 10 · answered Jan 17 '10 at 17:26

You can do a seek() call which positions your read head to a specified byte within the file. This won't help you unless you know exactly how many bytes (characters) are written in the file before the line you want to read. Perhaps your file is strictly formatted (each line is X number of bytes?) or, you could count the number of characters yourself (remember to include invisible characters like line breaks) if you really want the speed boost.

Otherwise, you do have to read every line prior to the line you desire, as per one of the many solutions already proposed here.

score 4 · Answer 11 · answered Jan 17 '10 at 17:33

def getitems(iterable, items):
  items = list(items) # get a list from any iterable and make our own copy
                      # since we modify it
  if items:
    items.sort()
    for n, v in enumerate(iterable):
      if n == items[0]:
        yield v
        items.pop(0)
        if not items:
          break

print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item

Roger, my favorite guy! This could benefit from a with statement. — Hamish Grubijan, Jan 17 '10 at 17:55

score 4 · Answer 12 · edited Aug 15 '21 at 12:50

4

with open("test.txt", "r") as fp:
   lines = fp.readlines()
print(lines[3])

test.txt is filename
prints line number four in test.txt

edited Aug 15 '21 at 12:50

Gruber

2,196
5
28
50

answered Sep 13 '20 at 07:46

Shah Vipul

625
7
11

for larger file, readlines does not scale at all. – kta Aug 06 '23 at 07:16

score 3 · Answer 13 · answered Jan 17 '10 at 17:18

3

How about this:

>>> with open('a', 'r') as fin: lines = fin.readlines()
>>> for i, line in enumerate(lines):
      if i > 30: break
      if i == 26: dox()
      if i == 30: doy()

answered Jan 17 '10 at 17:18

Hamish Grubijan

10,562
23
99
147

1

True, this is less efficient than the one by Alok, but mine uses a with statement ;) – Hamish Grubijan Jan 17 '10 at 17:33

score 3 · Answer 14 · answered Jan 17 '10 at 17:21

3

If you don't mind importing then fileinput does exactly what you need (this is you can read the line number of the current line)

answered Jan 17 '10 at 17:21

ennuikiller

46,381
14
112
137

score 3 · Answer 15 · answered Jan 17 '10 at 18:37

I prefer this approach because it's more general-purpose, i.e. you can use it on a file, on the result of f.readlines(), on a StringIO object, whatever:

def read_specific_lines(file, lines_to_read):
   """file is any iterable; lines_to_read is an iterable containing int values"""
   lines = set(lines_to_read)
   last = max(lines)
   for n, line in enumerate(file):
      if n + 1 in lines:
          yield line
      if n + 1 > last:
          return

>>> with open(r'c:\temp\words.txt') as f:
        [s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']

score 3 · Answer 16 · answered Jan 19 '10 at 01:29

Here's my little 2 cents, for what it's worth ;)

def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
    fp   = open(filename, "r")
    src  = fp.readlines()
    data = [(index, line) for index, line in enumerate(src) if index in lines]
    fp.close()
    return data


# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
    print "Line: %s\nData: %s\n" % (line[0], line[1])

score 3 · Answer 17 · edited Dec 15 '15 at 08:10

3

A better and minor change for Alok Singhal's answer

fp = open("file")
for i, line in enumerate(fp,1):
    if i == 26:
        # 26th line
    elif i == 30:
        # 30th line
    elif i > 30:
        break
fp.close()

edited Dec 15 '15 at 08:10

Dirk Horsten

3,753
4
20
37

answered Dec 15 '15 at 07:13

sedic

31
1

score 3 · Answer 18 · answered Feb 21 '16 at 20:48

3

You can do this very simply with this syntax that someone already mentioned, but it's by far the easiest way to do it:

inputFile = open("lineNumbers.txt", "r")
lines = inputFile.readlines()
print (lines[0])
print (lines[2])

answered Feb 21 '16 at 20:48

Trey50Daniel

179
3
14

Mike Adrion · Answer 19 · 2018-12-14T18:11:33.157

Fairly quick and to the point.

To print certain lines in a text file. Create a "lines2print" list and then just print when the enumeration is "in" the lines2print list. To get rid of extra '\n' use line.strip() or line.strip('\n'). I just like "list comprehension" and try to use when I can. I like the "with" method to read text files in order to prevent leaving a file open for any reason.

lines2print = [26,30] # can be a big list and order doesn't matter.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in lines2print]

or if list is small just type in list as a list into the comprehension.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in [26,30]]

score 1 · Answer 20 · answered Jan 17 '10 at 17:18

1

File objects have a .readlines() method which will give you a list of the contents of the file, one line per list item. After that, you can just use normal list slicing techniques.

http://docs.python.org/library/stdtypes.html#file.readlines

answered Jan 17 '10 at 17:18

Josh Wright

2,495
16
20

score 1 · Answer 21 · answered Jan 18 '10 at 00:32

1

@OP, you can use enumerate

for n,line in enumerate(open("file")):
    if n+1 in [26,30]: # or n in [25,29] 
       print line.rstrip()

answered Jan 18 '10 at 00:32

ghostdog74

327,991
56
259
343

score 1 · Answer 22 · answered May 17 '15 at 15:38

1

file = '/path/to/file_to_be_read.txt'
with open(file) as f:
    print f.readlines()[26]
    print f.readlines()[30]

Using the with statement, this opens the file, prints lines 26 and 30, then closes the file. Simple!

answered May 17 '15 at 15:38

user3901273

19
1

this isn't a valid answer. after the first call to `readlines()` the iterator will be exhausted and the second call will either return an empty list or throw an error (can't remember which) – Paul H Oct 11 '17 at 17:34

score 1 · Answer 23 · answered Sep 30 '18 at 14:39

1

To print line# 3,

line_number = 3

with open(filename,"r") as file:
current_line = 1
for line in file:
    if current_line == line_number:
        print(file.readline())
        break
    current_line += 1

Original author: Frank Hofmann

answered Sep 30 '18 at 14:39

crazy_daffodils

21
3

sudhir tataraju · Answer 24 · 2018-08-22T16:38:28.883

0

To print desired line. To print line above/below required line.

def dline(file,no,add_sub=0):
    tf=open(file)
    for sno,line in enumerate(tf):
        if sno==no-1+add_sub:
         print(line)
    tf.close()

execute---->dline("D:\dummy.txt",6) i.e dline("file path", line_number, if you want upper line of the searched line give 1 for lower -1 this is optional default value will be taken 0)

edited Aug 22 '18 at 16:38

answered Aug 22 '18 at 16:19

sudhir tataraju

1,159
1
14
30

score 0 · Answer 25 · answered Oct 14 '19 at 16:01

0

If you want to read specific lines, such as line starting after some threshold line then you can use the following codes, file = open("files.txt","r") lines = file.readlines() ## convert to list of lines datas = lines[11:] ## raed the specific lines

answered Oct 14 '19 at 16:01

Niharranjan Pradhan

131
1
3

score 0 · Answer 26 · answered Apr 07 '22 at 13:01

Do Not Use readlines!

My solutition is:


with open(filename) as f:
    specify = [26, 30]
    results = list(
        map(lambda line: line[1],
            filter(lambda line: line[0] in specify,
                   enumerate(f))
            )
    )

Test as follow for a 6.5G file:

import time

filename = 'a.txt'
start = time.time()
with open(filename, 'w') as f:
    for i in range(10_000_000):
        f.write(f'{str(i)*100}\n')       
end1 = time.time()

with open(filename) as f:
    specify = [26, 30]
    results = list(
        map(lambda line: line[1],
            filter(lambda line: line[0] in specify,
                   enumerate(f))
            )
    )
end2 = time.time()
print(f'write time: {end1-start}')
print(f'read time: {end2-end1}')
# write time: 14.38945460319519
# read time: 8.380386352539062

Abhishek · Answer 27 · 2022-04-08T04:40:04.477

0

You can do it with one of the simplest logic of splitting the string in an array or List.

f = open('filepath')
r = f.read()
s = r.split("\n")
n = [linenumber1, linenumber2] # [26, 29] in your 
                               #case
for x in n:
  print(s[x-1])
f.close()

edited Apr 08 '22 at 04:40

answered Apr 08 '22 at 04:15

Abhishek

546
5
13

score -1 · Answer 28 · answered Apr 10 '15 at 06:05

-1

I think this would work

 open_file1 = open("E:\\test.txt",'r')
 read_it1 = open_file1.read()
 myline1 = []
 for line1 in read_it1.splitlines():
 myline1.append(line1)
 print myline1[0]

answered Apr 10 '15 at 06:05

San k

141
9

There were already a dozen readline methods when you posted this--adding another just adds clutter – duhaime Jun 04 '18 at 14:31

inspectorG4dget · Answer 29 · 2010-01-17T22:53:52.713

-2

f = open(filename, 'r')
totalLines = len(f.readlines())
f.close()
f = open(filename, 'r')

lineno = 1
while lineno < totalLines:
    line = f.readline()

    if lineno == 26:
        doLine26Commmand(line)

    elif lineno == 30:
        doLine30Commmand(line)

    lineno += 1
f.close()

edited Jan 17 '10 at 22:53

answered Jan 17 '10 at 17:52

inspectorG4dget

110,290
27
149
241

8

this is as unpythonic as it gets. – SilentGhost Jan 17 '10 at 17:55
Gives the wrong result, as you can't use readlines and readline like that (they each change the current read position). – Jan 17 '10 at 18:02
I'm sorry for having overlooked a HUGE error in my first code. The error has been corrected and the current code should work as expected. Thanks for pointing out my error, Roger Pate. – inspectorG4dget Jan 17 '10 at 22:55

score -3 · Answer 30 · edited Jul 05 '20 at 13:00

-3

Reading from specific line:

n = 4   # for reading from 5th line
with open("write.txt",'r') as t:
     for i,line in enumerate(t):
         if i >= n:             # i == n-1 for nth line
            print(line)

edited Jul 05 '20 at 13:00

Chris Catignani

5,040
16
42
49

answered Jul 05 '20 at 12:35

vamsi krishna

1
1

thats no built in feautre – Sven Jul 05 '20 at 13:50

How to read specific lines from a file (by line number)?

30 Answers30

Huge files

Fairly quick and to the point.

Linked

Related