I need to read the last 25 lines from a file (for displaying the most recent log entries). Is there anyway in Ruby to start at the end of a file and read it backwards?
9 Answers
If on a *nix system with tail
, you can cheat like this:
last_25_lines = `tail -n 25 whatever.txt`

- 30,053
- 5
- 59
- 54
-
1I think a library would be more sufficient for cross platform capability, but you get the idea. – John T Apr 16 '09 at 03:07
-
2Probably true. I don't run Ruby code on anything but *nix-based systems and I think you'll have a hard time finding one of those without 'tail'... Also, sometimes you don't have the luxury of installing a library. Just wanted to show the 'one liner' :) – rfunduk Apr 16 '09 at 19:04
-
So far I haven't deployed on an environment that doesn't have tail. This answer should be accepted in my opinion. – Adam B Mar 20 '15 at 18:40
Is the file large enough that you need to avoid reading the whole thing? If not, you could just do
IO.readlines("file.log")[-25..-1]
If it is to big, you may need to use IO#seek
to read from near the end of the file, and continue seeking toward the beginning until you've seen 25 lines.

- 322,767
- 57
- 360
- 340
-
4If you don't want to go through the trouble of reversing it, you can use [-25..-1] instead. – sris Apr 16 '09 at 05:28
-
Beaty. No need to assume anything about system commands available this way. Thanks. :) – Allain Lalonde Jul 17 '13 at 19:59
-
3@sris problem with [-25..-1], if the file has less than 25 lines then the result is nil, I would recommend using `IO.readlines("file.log").last(25)` which returns empty array in that case. – Rohit Banga Jan 24 '16 at 09:52
There is a library for Ruby called File::Tail. This can get you the last N lines of a file just like the UNIX tail utility.
I assume there is some seek optimization in place in the UNIX version of tail with benchmarks like these (tested on a text file just over 11M):
[john@awesome]$du -sh 11M.txt
11M 11M.txt
[john@awesome]$time tail -n 25 11M.txt
/sbin/ypbind
/sbin/arptables
/sbin/arptables-save
/sbin/change_console
/sbin/mount.vmhgfs
/misc
/csait
/csait/course
/.autofsck
/~
/usb
/cdrom
/homebk
/staff
/staff/faculty
/staff/faculty/darlinr
/staff/csadm
/staff/csadm/service_monitor.sh
/staff/csadm/.bash_history
/staff/csadm/mysql5
/staff/csadm/mysql5/MySQL-server-community-5.0.45-0.rhel5.i386.rpm
/staff/csadm/glibc-common-2.3.4-2.39.i386.rpm
/staff/csadm/glibc-2.3.4-2.39.i386.rpm
/staff/csadm/csunixdb.tgz
/staff/csadm/glibc-headers-2.3.4-2.39.i386.rpm
real 0m0.012s
user 0m0.000s
sys 0m0.010s
I can only imagine the Ruby library uses a similar method.
Edit:
for Pax's curiosity:
[john@awesome]$time cat 11M.txt | tail -n 25
/sbin/ypbind
/sbin/arptables
/sbin/arptables-save
/sbin/change_console
/sbin/mount.vmhgfs
/misc
/csait
/csait/course
/.autofsck
/~
/usb
/cdrom
/homebk
/staff
/staff/faculty
/staff/faculty/darlinr
/staff/csadm
/staff/csadm/service_monitor.sh
/staff/csadm/.bash_history
/staff/csadm/mysql5
/staff/csadm/mysql5/MySQL-server-community-5.0.45-0.rhel5.i386.rpm
/staff/csadm/glibc-common-2.3.4-2.39.i386.rpm
/staff/csadm/glibc-2.3.4-2.39.i386.rpm
/staff/csadm/csunixdb.tgz
/staff/csadm/glibc-headers-2.3.4-2.39.i386.rpm
real 0m0.350s
user 0m0.000s
sys 0m0.130s
still under a second, but if there is a lot of file operations this makes a big difference.
-
1What does "cat 11M.txt | tail -n 25" give you? That will force tail to process the whole stream. – paxdiablo Apr 16 '09 at 02:55
-
Or just cat 11M.txt >/dev/null for that matter - that will give you the time to process the stream, which may well be in the order of 1/100th of a second. – paxdiablo Apr 16 '09 at 02:56
-
Bugbear of mine, @JohnT: "29 times slower" of 100secs is -2800secs. The correct phrase is "roughly 1/29th the speed". But I take your point - clearly tail is using a seek method when it has the file rather than a stream. One would hope Ruby is that smart as well. – paxdiablo Apr 16 '09 at 03:32
-
Yeah that's what I meant... exams tomorrow I'm much too tired to concentrate =( – John T Apr 16 '09 at 04:02
Improved version of manveru's excellent seek-based solution. This one returns exactly n lines.
class File
def tail(n)
buffer = 1024
idx = [size - buffer, 0].min
chunks = []
lines = 0
begin
seek(idx)
chunk = read(buffer)
lines += chunk.count("\n")
chunks.unshift chunk
idx -= buffer
end while lines < ( n + 1 ) && pos != 0
tail_of_file = chunks.join('')
ary = tail_of_file.split(/\n/)
lines_to_return = ary[ ary.size - n, ary.size - 1 ]
end
end

- 1
- 1

- 201
- 2
- 8
-
That code works on a Mac, but fails on linux with an error message "`tail': undefined local variable or method `size'". Any ideas how to fix that? – earlyadopter Jun 04 '14 at 21:08
-
1There is no bound checking, meaning you may read end up seeking to a negative idx. Also, this isn't optimized very well for a file with very long lines (ie, saving the chunks into a predestined buffer). Posted a version that takes care of both. – Shai Jan 29 '15 at 18:26
I just wrote a quick implemenation with #seek
:
class File
def tail(n)
buffer = 1024
idx = (size - buffer).abs
chunks = []
lines = 0
begin
seek(idx)
chunk = read(buffer)
lines += chunk.count("\n")
chunks.unshift chunk
idx -= buffer
end while lines < n && pos != 0
chunks.join.lines.reverse_each.take(n).reverse.join
end
end
File.open('rpn-calculator.rb') do |f|
p f.tail(10)
end

- 2,770
- 1
- 20
- 17
-
1Actually while your seek-based code is close, it's not quite correct because it doesn't strip the part of the chunk that's before the first \n. See new Answer below. :) – Donald Scott Wilde Apr 18 '12 at 23:06
-
:11:in `tail': undefined method `count' for nil:NilClass (NoMethodError) – Istvan Feb 18 '13 at 05:00
-
This code has a bug that's easy to fix. The 4th line that sets idx should be "idx = size > buffer ? (size - buffer) : 0" -- this will fix the problem that @Istvan is seeing, though really that error should be handled better with at least something like "break unless chunk" right after the read(buffer) – David Ljung Madison Stellar Jan 15 '21 at 01:48
-
Actually, there should just be an 'idx=0 if idx<0' right before the seek() and then a 'break if idx<=0' right before the 'idx -= buffer' That handles n>1 as well. – David Ljung Madison Stellar Jan 15 '21 at 01:54
Here's a version of tail that doesn't store any buffers in memory while you go, but instead uses "pointers". Also does bound-checking so you don't end up seeking to a negative offset (if for example you have more to read but less than your chunk size left).
def tail(path, n)
file = File.open(path, "r")
buffer_s = 512
line_count = 0
file.seek(0, IO::SEEK_END)
offset = file.pos # we start at the end
while line_count <= n && offset > 0
to_read = if (offset - buffer_s) < 0
offset
else
buffer_s
end
file.seek(offset-to_read)
data = file.read(to_read)
data.reverse.each_char do |c|
if line_count > n
offset += 1
break
end
offset -= 1
if c == "\n"
line_count += 1
end
end
end
file.seek(offset)
data = file.read
end
test cases at https://gist.github.com/shaiguitar/6d926587e98fc8a5e301

- 1,281
- 9
- 7
I can't vouch for Ruby but most of these languages follow the C idiom of file I/O. That means there's no way to do what you ask other than searching. This usually takes one of two approaches.
- Starting at the start of the file and scanning it all, remembering the most recent 25 lines. Then, when you hit end of file, print them out.
- A similar approach but attempting to seek to a best-guess location first. That means seeking to (for example) end of file minus 4000 characters, then doing exactly what you did in the first approach with the proviso that, if you didn't get 25 lines, you have to back up and try again (e.g., to end of file minus 5000 characters).
The second way is the one I prefer since, if you choose your first offset wisely, you'll almost certainly only need one shot at it. Log files still tend to have fixed maximum line lengths (I think coders still have a propensity for 80-column files long after their usefulness has degraded). I tend to choose number of lines desired multiplied by 132 as my offset.
And from a cursory glance of Ruby docs online, it looks like it does follow the C idiom. You would use "ios.seek(25*-132,IO::SEEK_END)"
if you were to follow my advice, then read forward from there.

- 854,327
- 234
- 1,573
- 1,953
-
All of my terminals and emacs buffers are still 80 columns wide; that lets me fit several side by side on my monitor, which is very useful. – Brian Campbell Apr 16 '09 at 06:03
-
I'm pretty sure the IO#seek is going to be the optimal solution, performance-wise. – Mike Woodhouse Apr 16 '09 at 08:43
I implemented a variation to Donald's code that works when n is larger than the number of lines in the file:
class MyFile < File
def tail(n)
buffer = 20000
# Negative indices are not allowed:
idx = [size - buffer, 0].max
chunks = []
lines = 0
begin
seek(idx)
chunk = read(buffer)
# Handle condition when file is empty:
lines += chunk.nil? ? 0 : chunk.count("\n")
chunks.unshift chunk
# Limit next buffer's size when we've reached the start of the file,
# to ensure two consecutive buffers don't overlap content,
# and to ensure idx doesn't become negative:
buffer = [buffer, idx].min
idx -= buffer
end while (lines < ( n + 1 )) && (pos != 0)
tail_of_file = chunks.join('')
ary = tail_of_file.split(/\n/)
# Prevent trying to extract more lines than are in the file:
n = [n, ary.size].min
lines_to_return = ary[ ary.size - n, ary.size - 1 ]
end
end

- 21
- 2
How about:
file = []
File.open("file.txt").each_line do |line|
file << line
end
file.reverse.each_with_index do |line, index|
puts line if index < 25
end
The performance would be awful over a big file as it iterates twice, the better approach would be the already mentioned read the file and store the last 25 lines in memory and display those. But this was just an alternative thought.

- 5,496
- 1
- 28
- 35