13

I've been working on a log viewer for a Rails app and have found that I need to read around 200 lines of a log file from bottom to top instead of the default top to bottom.

Log files can get quite large, so I've already tried and ruled out the IO.readlines("log_file.log")[-200..-1] method.

Are there any other ways to go about reading a file backwards in Ruby without the need for a plugin or gem?

ericalli
  • 1,203
  • 15
  • 25

3 Answers3

18

The only correct way to do this that also works on enormous files is to read n bytes at a time from the end until you have the number of lines that you want. This is essentially how Unix tail works.

An example implementation of IO#tail(n), which returns the last n lines as an Array:

class IO
  TAIL_BUF_LENGTH = 1 << 16

  def tail(n)
    return [] if n < 1

    seek -TAIL_BUF_LENGTH, SEEK_END

    buf = ""
    while buf.count("\n") <= n
      buf = read(TAIL_BUF_LENGTH) + buf
      seek 2 * -TAIL_BUF_LENGTH, SEEK_CUR
    end

    buf.split("\n")[-n..-1]
  end
end

The implementation is a little naive, but a quick benchmark shows what a ridiculous difference this simple implementation can already make (tested with a ~25MB file generated with yes > yes.txt):

                            user     system      total        real
f.readlines[-200..-1]   7.150000   1.150000   8.300000 (  8.297671)
f.tail(200)             0.000000   0.000000   0.000000 (  0.000367)

The benchmark code:

require "benchmark"

FILE = "yes.txt"

Benchmark.bmbm do |b|
  b.report "f.readlines[-200..-1]" do
    File.open(FILE) do |f|
      f.readlines[-200..-1]
    end
  end

  b.report "f.tail(200)" do
    File.open(FILE) do |f|
      f.tail(200)
    end
  end
end

Of course, other implementations already exist. I haven't tried any, so I cannot tell you which is best.

molf
  • 73,644
  • 13
  • 135
  • 118
  • I think you mean `TAIL_BUF_LENGTH = 2**16` or `1 << 16`, both of which evaluate to `65536` (64Ki). `2^16` is binary exclusive-or and evaluates to `18`. – Jörg W Mittag Jun 11 '10 at 18:06
  • Works great! The benchmark difference is insane compared to readlines. Is it possible to also output the corresponding line number for each line in the resulting array? Thanks! – ericalli Jun 11 '10 at 18:41
  • @two2twelve: No, it isn't. The *whole purpose* of this entire exercise is to read the file "from bottom to top". (Your words, not mine.) How would you know at which line (which is counted from the *top* of the file) you are, if you started at the *bottom*? Or did you mean to count from the bottom upwards? In that case, it's easy: the line at index `i` in the buffer is the `n-i` th line from the bottom. – Jörg W Mittag Jun 12 '10 at 01:38
  • @molf, I've got the following problem in windows 7, do you know how to solve it? `tasks.rb:14:in \`seek': Invalid argument - a.log (Errno::EINVAL) from D:/test.rb:14:in \`tail'` – aaron Mar 28 '13 at 05:55
  • @aaron: The implementation is naive and does not check for errors. The file is probably smaller than the seek buffer, leading to an invalid seek offset. There may be other edge cases that produce errors. – molf Mar 28 '13 at 12:12
4

There's a module Elif available (a port of Perl's File::ReadBackwards) which does efficient line-by-line backwards reading of files.

hobbs
  • 223,387
  • 19
  • 210
  • 288
1

Since I'm too new to comment on molf awesome answer I have to post it as a separate answer. I needed this feature to read log files while they're written , and the last portion of the logs contain the string I need to know it's done and I can start parsing it.

Hence handling small sized files is crucial for me (I might ping the log while it's tiny). So I enhanced molf code:

class IO
    def tail(n)
        return [] if n < 1
        if File.size(self) < ( 1 << 16 ) 
            tail_buf_length = File.size(self)
            return self.readlines.reverse[0..n-1]
        else 
            tail_buf_length = 1 << 16
        end
        self.seek(-tail_buf_length,IO::SEEK_END)
        out   = ""
        count = 0
        while count <= n
            buf     =  self.read( tail_buf_length )
            count   += buf.count("\n")
            out     += buf
            # 2 * since the pointer is a the end , of the previous iteration
            self.seek(2 * -tail_buf_length,IO::SEEK_CUR)
        end
        return out.split("\n")[-n..-1]
    end
end
Ohad Dahan
  • 371
  • 3
  • 14