0

I want to read text file given below in reverse direction line by line. I don't want to use readlines() or read().

a.txt

2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:

expected result:

2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr

My Solution:

with open('a.txt') as lines:
    for line in reversed(lines):
        print(line)
martineau
  • 119,623
  • 25
  • 170
  • 301
user15051990
  • 1,835
  • 2
  • 28
  • 42

3 Answers3

4

Here's a way to do it without reading the whole file into memory all at once. It does require first reading the whole file, but only storing where each line starts. Once that is known, it can use the seek() method to randomly access each one in any order desired.

Here's an example using your input file:

# Preprocess - read whole file and note where lines start.
# (Needs to be done in binary mode.)
with open('text_file.txt', 'rb') as file:
    offsets = [0]  # First line is always at offset 0.
    for line in file:
        offsets.append(file.tell())  # Append where *next* line would start.

# Now reread lines in file in reverse order.
with open('text_file.txt', 'rb') as file:
    for index in reversed(range(len(offsets)-1)):
        file.seek(offsets[index])
        size = offsets[index+1] - offsets[index]  # Difference with next.
        # Read bytes, convert them to a string, and remove whitespace at end.
        line = file.read(size).decode().rstrip()
        print(line)

Output:

2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr

Update

Here's a version that does the same thing but uses Python's mmap module to memory-map the file which should provide better performance by taking advantage of your OS/hardware's virtual-memory capabilities.

This is because, as PyMOTW-3 put's it:

Memory-mapping typically improves I/O performance because it does not involve a separate system call for each access and it does not require copying data between buffers – the memory is accessed directly by both the kernel and the user application.

Code:

import mmap

with open('text_file.txt', 'rb') as file:
    with mmap.mmap(file.fileno(), length=0, access=mmap.ACCESS_READ) as mm_file:

        # First preprocess the file and note where lines start.
        # (Needs to be done in binary mode.)
        offsets = [0]  # First line is always at offset 0.
        for line in iter(mm_file.readline, b""):
            offsets.append(mm_file.tell())  # Append where *next* line would start.

        # Now process the lines in file in reverse order.
        for index in reversed(range(len(offsets)-1)):
            mm_file.seek(offsets[index])
            size = offsets[index+1] - offsets[index]  # Difference with next.
            # Read bytes, convert them to a string, and remove whitespace at end.
            line = mm_file.read(size).decode().rstrip()
            print(line)
martineau
  • 119,623
  • 25
  • 170
  • 301
2

No, there isn't a better way to do this. By definition, a file is a sequential organization of some basic data type. A text file's type is character. You are trying to impose a different organization on the file, strings separated by newlines.

Thus, you have to do the work to read the file, re-cast into your desired format, and then take that organization in reverse order. For instance, were you to need this multiple times ... read the file as lines, store the lines as data base records, and then iterate through the records as you see fit.

The file interface reads in only one direction. You can seek() to another location, but the standard I/O operations work only with increasing location descriptions.

For your solution to work, you'll need to read in the entire file -- you can't reverse the file descriptor's implicit iterator.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Technically you could seek to the end of file -1, read 1 char, seek -2, read another. etc. I wonder how slow would that be though. – Gnudiff Feb 28 '19 at 18:43
  • 1
    @Gnudiff: Yes, you could, but that reads in reverse *byte* order. OP wants to reverse the *line* order. Granted, you could simply back up through the file, looking for each newline. Yes, it's slow. – Prune Feb 28 '19 at 18:52
  • There are, however, better ways to read the file, see the latest update to [my answer](https://stackoverflow.com/a/54932624/355230), – martineau Mar 01 '19 at 16:22
0

Whlie @martineau's solution gets the job done without loading the entire file into the memory, it nevertheless wastefully reads the entire file twice.

An arguably more efficient, one-pass approach is to read from the end of the file in reasonably large chunks into a buffer, look for the next newline character from the end of the buffer (minus the trailing newline at the last character), and if not found, seek backwards and keep reading in chunks and prepending the chunks to the buffer until a newline character is found. Use a larger chunk size for more efficient reads as long as it's within the memory limit:

class ReversedTextReader:
    def __init__(self, file, chunk_size=50):
        self.file = file
        file.seek(0, 2)
        self.position = file.tell()
        self.chunk_size = chunk_size
        self.buffer = ''

    def __iter__(self):
        return self

    def __next__(self):
        if not self.position and not self.buffer:
            raise StopIteration
        chunk = self.buffer
        while True:
            line_start = chunk.rfind('\n', 0, len(chunk) - 1 - (chunk is self.buffer))
            if line_start != -1:
                break
            chunk_size = min(self.chunk_size, self.position)
            self.position -= chunk_size
            self.file.seek(self.position)
            chunk = self.file.read(chunk_size)
            if not chunk:
                line = self.buffer
                self.buffer = ''
                return line
            self.buffer = chunk + self.buffer
        line_start += 1
        line = self.buffer[line_start:]
        self.buffer = self.buffer[:line_start]
        return line

so that:

from io import StringIO

f = StringIO('''2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
''')

for line in ReversedTextReader(f):
    print(line, end='')

outputs:

2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • Reading the file twice, as done in [my answer](https://stackoverflow.com/a/54932624/355230), may not be as inefficient as you think—the OS probably buffers, caches, or otherwise optimizes that kind of thing. – martineau Mar 01 '19 at 12:41
  • For smaller files yes, but we're talking about a file so large that the OP can't load it into the memory in the first place, so chances are better that whatever buffer/cache that the OS has won't be able to accommodate the file and can't help optimize reading that file twice either. – blhsing Mar 01 '19 at 13:04
  • True, this appears to be a very extreme case—precisely what the OS's optimizations were designed to mitigate. Actually, now that I think more about it, memory-mapping the file might be an even better way to leverage the OS's (and hardware's) capabilities—see the latest update to [my answer](https://stackoverflow.com/a/54932624/355230). – martineau Mar 01 '19 at 16:25