16

I have a csv file that grows until it reaches approximately 48M of lines.

Before adding new lines to it, I need to read the last line.

I tried the code below, but it got too slow and I need a faster alternative:

def return_last_line(filepath):    
    with open(filepath,'r') as file:        
        for x in file:
            pass
        return x        
return_last_line('lala.csv')
Luiz Fernando
  • 211
  • 2
  • 7
  • 3
    You can use `f.seek(-1, 2)`. This is a quote from Python docs "To change the file object’s position, use f.seek(offset, whence). The position is computed from adding offset to a reference point; the reference point is selected by the whence argument. A whence value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.". So -1 means the last byte and 2 means read from the end of file – Pooya Kamranjam Mar 06 '21 at 15:35
  • 2
    `f.seek(offset, whence)` https://docs.python.org/3/tutorial/inputoutput.html – Pooya Kamranjam Mar 06 '21 at 15:40
  • 1
    BTW, isn't it possible to keep the file *open* over the writes? – Antti Haapala -- Слава Україні Mar 06 '21 at 16:38
  • That is basically a subset of my solution here: https://stackoverflow.com/a/26747854/122033 – Berislav Lopac Mar 16 '21 at 22:28
  • Does this answer your question? [Get last n lines of a file, similar to tail](https://stackoverflow.com/questions/136168/get-last-n-lines-of-a-file-similar-to-tail) – Haleemur Ali Mar 17 '21 at 19:32

9 Answers9

9

Here is my take, in python: I created a function that lets you choose how many last lines, because the last lines may be empty.

def get_last_line(file, how_many_last_lines = 1):

    # open your file using with: safety first, kids!
    with open(file, 'r') as file:

        # find the position of the end of the file: end of the file stream
        end_of_file = file.seek(0,2)
        
        # set your stream at the end: seek the final position of the file
        file.seek(end_of_file)             
        
        # trace back each character of your file in a loop
        n = 0
        for num in range(end_of_file+1):            
            file.seek(end_of_file - num)    
           
            # save the last characters of your file as a string: last_line
            last_line = file.read()
           
            # count how many '\n' you have in your string: 
            # if you have 1, you are in the last line; if you have 2, you have the two last lines
            if last_line.count('\n') == how_many_last_lines: 
                return last_line
get_last_line('lala.csv', 2)

This lala.csv has 48 million lines, such as in your example. It took me 0 seconds to get the last line.

Sergio Marinho
  • 146
  • 1
  • 5
  • 3
    This isn't actually correct. The '\n' count is one too little for Unix text files. A line is *terminated* by \n, therefore a text file ends with '\n' and by default your `get_last_line` would just return the *line terminator* for the last line, not the last line. – Antti Haapala -- Слава Україні Mar 06 '21 at 16:11
  • But it worked... Sorry, I did not understand your complaint. Are you saying it would not work outside of Windows? – Sergio Marinho Mar 06 '21 at 17:13
  • The call doesn't return *2* lines. It returns *one line*. The minimal span that contains *two* `\n`s contains one line, the last line of the file, and the line terminator of the *previous* line. – Antti Haapala -- Слава Україні Mar 06 '21 at 18:27
  • 1
    I see what you mean now. However, in a txt file I created as a test, there was only one \n. That's why I enabled the option to select how many last lines. Here folows my test: with open('teste.txt', 'w') as x: x.write('lala\nfifi\ndede') – Sergio Marinho Mar 06 '21 at 18:46
  • 1
    yes, it needs to work in *both* cases. However, a text file *must* end in a \n on Unix – Antti Haapala -- Слава Україні Mar 06 '21 at 20:25
  • 1
    A mac is unix, right? The file I created on a mac did not finish on a '\n'. I created the file with python. Anyways, this conversation helped me further understand the code. Thanks! – Sergio Marinho Mar 06 '21 at 21:38
7

Here is code for finding the last line of a file mmap, and it should work on Unixen and derivatives and Windows alike (I've tested this on Linux only, please tell me if it works on Windows too ;), i.e. pretty much everywhere where it matters. Since it uses memory mapped I/O it could be expected to be quite performant.

It expects that you can map the entire file into the address space of a processor - should be OK for 50M file everywhere but for 5G file you'd need a 64-bit processor or some extra slicing.

import mmap


def iterate_lines_backwards(filename):
    with open(filename, "rb") as f:
        # memory-map the file, size 0 means whole file
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            start = len(mm)

            while start > 0:
                start, prev = mm.rfind(b"\n", 0, start), start
                slice = mm[start + 1:prev + 1]
                # if the last character in the file was a '\n',
                # technically the empty string after that is not a line.
                if slice:
                    yield slice.decode()


def get_last_nonempty_line(filename):
    for line in iterate_lines_backwards(filename):
        if stripped := line.rstrip("\r\n"):
            return stripped


print(get_last_nonempty_line("datafile.csv"))

As a bonus there is a generator iterate_lines_backwards that would efficiently iterate over the lines of a file in reverse for any number of lines:

print("Iterating the lines of datafile.csv backwards")
for l in iterate_lines_backwards("datafile.csv"):
    print(l, end="")
4

If you are running your code in a Unix based environment, you can execute tail shell command from Python to read the last line:

import subprocess

subprocess.run(['tail', '-n', '1', '/path/to/lala.csv'])
Shiva
  • 2,627
  • 21
  • 33
3

This is generally a rather tricky thing to do. A very efficient way of getting a chunk that includes the last lines is the following:

import os


def get_last_lines(path, offset=500):
    """ An efficient way to get the last lines of a file.

    IMPORTANT: 
    1. Choose offset to be greater than 
    max_line_length * number of lines that you want to recover.
    2. This will throw an os.OSError if the file is shorter than
    the offset.
    """
    with path.open("rb") as f:
        f.seek(-offset, os.SEEK_END)
        while f.read(1) != b"\n":
            f.seek(-2, os.SEEK_CUR)
        return f.readlines()

You need to know the maximum line length though and ensure that the file is at least one offset long!

To use it, do the following:

from pathlib import Path


n_last_lines = 10
last_bit_of_file = get_last_lines(Path("/path/to/my/file"))
real_last_n_lines = last_bit_of_file[-10:]

Now finally you need to decode the binary to strings:

real_last_n_lines_non_binary = [x.decode() for x in real_last_n_lines]

Probably all of this could be wrapped in one more convenient function.

kuropan
  • 774
  • 7
  • 18
2

You could additionally store the last line in a separate file, which you update whenever you add new lines to the main file.

Manuel
  • 912
  • 4
  • 11
1

This works well for me:
https://pypi.org/project/file-read-backwards/

from file_read_backwards import FileReadBackwards

with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:

    # getting lines by lines starting from the last line up
    for l in frb:
        if l:
            print(l)
            break
0

An easy way to do this is with deque:

from collections import deque

def return_last_line(filepath):
    with open(filepath,'r') as f:
        q = deque(f, 1)
    return q[0]
pakpe
  • 5,391
  • 2
  • 8
  • 23
0

since seek() returns the position that it moved to, you can use it to move backward and position the cursor to the beginning of the last line.

with open("test.txt") as f:
    p = f.seek(0,2)-1              # ignore trailing end of line
    while p>0 and f.read(1)!="\n": # detect end of line (or start of file)
        p = f.seek(p-1,0)          # search backward
    lastLine = f.read().strip()    # read from start of last line
print(lastLine)

To get the last non-empty line, you can add a while loop around the search:

with open("test.txt") as f:
    p,lastLine = f.seek(0,2),""    # start from end of file
    while p and not lastLine:      # want last non-empty line
        while p>0 and f.read(1)!="\n": # detect end of line (or start of file)
            p = f.seek(p-1,0)          # search backward
        lastLine = f.read().strip()    # read from start of last line
Alain T.
  • 40,517
  • 4
  • 31
  • 51
0

Based on @kuropan

Faster and shorter:

# 60.lastlinefromlargefile.py
# juanfc 2021-03-17

import os


def get_last_lines(fileName, offset=500):
    """ An efficient way to get the last lines of a file.

    IMPORTANT:
    1. Choose offset to be greater than
    max_line_length * number of lines that you want to recover.
    2. This will throw an os.OSError if the file is shorter than
    the offset.
    """
    with open(fileName, "rb") as f:
        f.seek(-offset, os.SEEK_END)
        return f.read().decode('utf-8').rstrip().split('\n')[-1]



print(get_last_lines('60.lastlinefromlargefile.py'))
juanfal
  • 113
  • 5