How to read a large log/txt file(in several GB's) in a way that first it takes N number of lines in memory and then it takes next N number of lines

Question

I have tried this program which is reading my file by characters in chunks which is the behaviour I want

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        print(piece)

But When I try to apply the same method using readlines() then it doesn't works for me. Here is the code I am trying..

def read_in_chunks(file_object, chunk_size=5):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines()[0:chunk_size]
        if not data:
            break
        yield data


with open('Traefik.log') as f:
    for piece in read_in_chunks(f):
        print(piece)

Can Somebody help me how can I achieve the same chunks behaviour for N number of lines?

Does this answer your question? [Lazy Method for Reading Big File in Python?](https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python) — MYousefi, May 01 '22 at 08:55

Timus · Answer 1 · 2022-05-01T21:01:23.910

By default .readlines() reads the whole content of the stream into a list. But you can give it a byte size to produce lines in chunks:

Read and return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

So, you could adjust your function to something like:

def read_in_chunks(file_object, chunk_size_hint=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines(chunk_size_hint)
        if not data:
            break
        yield data

But that doesn't guarantee a fixed number of lines per chunk. If you look a bit further in the docs you'll find the following advice:

Note that it’s already possible to iterate on file objects using for line in file: ... without calling file.readlines().

That's a hint that something like this

def read_in_chunks(file_object, chunk_size=10):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 10 lines"""
    data = []
    for n, line in enumerate(file_object, start=1):
        data.append(line)
        if not n % chunk_size:
            yield data
            data = []
    if data:
        yield data

might be better suited.

How to read a large log/txt file(in several GB's) in a way that first it takes N number of lines in memory and then it takes next N number of lines

1 Answers1