-1

How would I go about printing the last line in a text file that is about 612 MB and has about 4 million lines of text consisting of This is a line. So far I have:

File.py

f = open("foo.txt","r+")
datalist = []
for line in f:
    datalist.append(line)
print(datalist[-1])

The only problem that I see with my code is that it uses a lot of memory. I have heard people using os.lseek instead but I do not know how to implement it.

Alex Lowe
  • 783
  • 3
  • 20
  • 43
  • 2
    Call `tail` via a `subprocess`? It reads the file backwards. Can't beat that (except by re-implemeting it in python). Are you on Linux? – jDo Apr 20 '16 at 22:04
  • @jDo Sadly not, I am on Windows 10 – Alex Lowe Apr 20 '16 at 22:07
  • `print("This is a line")`? I'm not sure how literally to take your description of the input format. – user2357112 Apr 20 '16 at 22:11
  • 2
    @Alex Ok... I think [this class](http://stackoverflow.com/a/5896210/6004486) does it well. It's basically a python re-implementation of Linux's `head` and `tail`. If you search for "read file backwards tail python" here or on google, there are lots of other examples. – jDo Apr 20 '16 at 22:20

5 Answers5

2

If you only need the last line, throw everything else away.

with open('foo.txt') as f:
    for line in f:
        pass

# `line` is the last line of the file.

Much faster (but far less readable) would be to start at the end of the file and move backwards by bytes until you find \n, then read.

with open('foo.txt') as f:
    fd = f.fileno()
    os.lseek(fd, 0, os.SEEK_END)
    while True:
        ch = os.read(fd, 1)
        if ch == b'\n':
            line = f.read()
            break
        else:
            os.lseek(fd, -2, os.SEEK_CUR)

# `line` is the last line of the file

This works by reading the file from the end, looking for the first newline, then reading forward from there.

Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • When I ran your code I got `TypeError: an integer is required (got type _io.TextIOWrapper)` – Alex Lowe Apr 20 '16 at 22:22
  • @Alex whoops, fixed. – Adam Smith Apr 20 '16 at 22:27
  • 1
    The efficiency of the reading-backwards solution could be improved by seeking and reading in chunks and using `rfind` to find the last `'\n'`. Also, the handling for newlines as the last character of a file is rather subtle (the code seeks to the end, reads nothing, and then goes back two characters, skipping the last character of the file! I think this actually produces the right result, but it's not obvious at first glance.) Ideally, there should also be some handling for if there's only one line and you end up trying to seek off the left side of the file, or if the file is empty. – user2357112 Apr 20 '16 at 23:53
  • @user2357112 agreed that this is a funky solution that doesn't catch all corner cases. This isn't production code, it's Stack Overflow examples :). I'd actually not thought about how to handle a file that ends in a newline, and this succeeds only on accident. – Adam Smith Apr 20 '16 at 23:55
  • @user2357112 jDo linked a beautiful helper class in the comments of the question that does this all much more elegantly. – Adam Smith Apr 20 '16 at 23:57
  • @user2357112 I do see what you are talking about should I use `.rstrip('\n')` to strip all newlines in the file? – Alex Lowe Apr 21 '16 at 00:07
0

Here's a very simple improvement which only stores a single line at a time:

f = open("foo.txt","r")
data = None
for line in f:
    data = line
print(data)

Or you can pick up the final loop value after the loop:

f = open("foo.txt","r")
line = None
for line in f:
    pass
print(line)

Note that in this example, line will be None if the file is empty (which is the reason for the initial assignment to line).

Tom Karzes
  • 22,815
  • 2
  • 22
  • 41
0

A quick improvement would be to just throw out datalist and only save the most recent line, since that's all you care about.

f = open("foo.txt","r+")
for line in f:
    pass
print(line)

I'd imagine there are other more efficient ways too; I just want to offer one that is a direct derivative of your code.

Christian
  • 709
  • 3
  • 8
0

You don't need to append each line to a list. Just use the loop variable:

line = None  # prevents a NameError if the file is empty

with open("foo.txt", "r+") as f: 
    for line in f:
        pass
print(line)
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
0

Check out deque in the collections module. There is a recipe for looking at the last 'n' number of lines in a file; i.e. tail.

https://docs.python.org/2/library/collections.html#deque-recipes

def tail(filename, n=10):
    'Return the last n lines of a file'
    return deque(open(filename), n)
Geoff D
  • 369
  • 4
  • 14