1

I have a large file containing 10_000_000 lines. I would like to get for example line number 5_000_000. I constructed this function in python:

def read_n_line(file: str, n: int) -> str:
    """Function returns n-th line of file"""
    with open(file) as f:
        content = f.readlines()
    return content[n]

but it is terribly slow on such big files. Is there a better way how to do this?

In bash I can do something like sed -n '5000000p' < my_file which is fast enough.

vojtam
  • 1,157
  • 9
  • 34
  • To use split command first, then go ahead your python script! – ElapsedSoul Mar 16 '22 at 09:27
  • usually I use awk command for this kind of huge files, maybe you can create a python script to call that https://unix.stackexchange.com/questions/89640/how-to-run-awk-for-some-number-of-lines – d_frEak Mar 16 '22 at 09:30
  • https://docs.python.org/3/library/linecache.html – vks Mar 16 '22 at 09:31
  • Duplicate of [How to jump to a particular line in a huge text file?](https://stackoverflow.com/questions/620367/how-to-jump-to-a-particular-line-in-a-huge-text-file), [Python fastest access to line in file](https://stackoverflow.com/questions/19189961/python-fastest-access-to-line-in-file), etc. – smci Mar 16 '22 at 09:32

1 Answers1

1

How does islice() perform on your file?

from itertools import islice

def read_n_line(filename, line_num, encoding='utf-8'):
    with open(filename, encoding=encoding) as f:
        return next(islice(f, line_num - 1, line_num))
Tomalak
  • 332,285
  • 67
  • 532
  • 628