85

In my PHP application I need to read multiple lines starting from the end of many files (mostly logs). Sometimes I need only the last one, sometimes I need tens or hundreds. Basically, I want something as flexible as the Unix tail command.

There are questions here about how to get the single last line from a file (but I need N lines), and different solutions were given. I'm not sure about which one is the best and which performs better.

Jesse Nickles
  • 1,435
  • 1
  • 17
  • 25
lorenzo-s
  • 16,603
  • 15
  • 54
  • 86

6 Answers6

295

Methods overview

Searching on the internet, I came across different solutions. I can group them in three approaches:

  • naive ones that use file() PHP function;
  • cheating ones that runs tail command on the system;
  • mighty ones that happily jump around an opened file using fseek().

I ended up choosing (or writing) five solutions, a naive one, a cheating one and three mighty ones.

  1. The most concise naive solution, using built-in array functions.
  2. The only possible solution based on tail command, which has a little big problem: it does not run if tail is not available, i.e. on non-Unix (Windows) or on restricted environments that don't allow system functions.
  3. The solution in which single bytes are read from the end of file searching for (and counting) new-line characters, found here.
  4. The multi-byte buffered solution optimized for large files, found here.
  5. A slightly modified version of solution #4 in which buffer length is dynamic, decided according to the number of lines to retrieve.

All solutions work. In the sense that they return the expected result from any file and for any number of lines we ask for (except for solution #1, that can break PHP memory limits in case of large files, returning nothing). But which one is better?

Performance tests

To answer the question I run tests. That's how these thing are done, isn't it?

I prepared a sample 100 KB file joining together different files found in my /var/log directory. Then I wrote a PHP script that uses each one of the five solutions to retrieve 1, 2, .., 10, 20, ... 100, 200, ..., 1000 lines from the end of the file. Each single test is repeated ten times (that's something like 5 × 28 × 10 = 1400 tests), measuring average elapsed time in microseconds.

I run the script on my local development machine (Xubuntu 12.04, PHP 5.3.10, 2.70 GHz dual core CPU, 2 GB RAM) using the PHP command line interpreter. Here are the results:

Execution time on sample 100 KB log file

Solution #1 and #2 seem to be the worse ones. Solution #3 is good only when we need to read a few lines. Solutions #4 and #5 seem to be the best ones. Note how dynamic buffer size can optimize the algorithm: execution time is a little smaller for few lines, because of the reduced buffer.

Let's try with a bigger file. What if we have to read a 10 MB log file?

Execution time on sample 10 MB log file

Now solution #1 is by far the worse one: in fact, loading the whole 10 MB file into memory is not a great idea. I run the tests also on 1MB and 100MB file, and it's practically the same situation.

And for tiny log files? That's the graph for a 10 KB file:

Execution time on sample 10 KB log file

Solution #1 is the best one now! Loading a 10 KB into memory isn't a big deal for PHP. Also #4 and #5 performs good. However this is an edge case: a 10 KB log means something like 150/200 lines...

You can download all my test files, sources and results here.

Final thoughts

Solution #5 is heavily recommended for the general use case: works great with every file size and performs particularly good when reading a few lines.

Avoid solution #1 if you should read files bigger than 10 KB.

Solution #2 and #3 aren't the best ones for each test I run: #2 never runs in less than 2ms, and #3 is heavily influenced by the number of lines you ask (works quite good only with 1 or 2 lines).

lorenzo-s
  • 16,603
  • 15
  • 54
  • 86
  • Btw, how about putting the code on BitBucket or something instead of in an annoying zip file? :p – Svish Mar 18 '13 at 21:05
  • Also... not quite sure your optimization is really that necessary, hehe. Not that much difference. – Svish Mar 18 '13 at 21:19
  • 6
    @Svish The code is on a GitHub Gist. If you are talking about the whole test files, I think it's unnecessary to put them in a repo... About the optimization: I really wanted to focus on performances because I had to use that code very intensely for few lines reading (less than 10). So, a large buffer seemed unnecessary to me. Note that axis are logarithmic: for few lines a reduced buffer means half the execution time! – lorenzo-s Mar 18 '13 at 21:33
  • It clumps all lines into one. Can we retain line breaks? – FractalSpace Feb 15 '15 at 20:01
  • Aye, need to retain line breaks as the file I'm pulling from is a log. Each line is a separate event. Any way to modify the code to preserve the line breaks? – Jguy Mar 31 '15 at 02:28
  • @Jguy to me, line breaks are retained in all the five script version. How are you using the code? – lorenzo-s Apr 01 '15 at 15:24
  • I am using Solution 5 in a model in CodeIgniter. `return trim($output);` gets returned to the controller via `$data['log'] = $this->mymodel->return_log();` which then gets passed to the view and just simply echo'ed – Jguy Apr 01 '15 at 19:04
  • @Jguy Don't know much about CodeIgniter, but maybe you echo it without replacing `"\n"`s with `
    `s? Try wrapping it in [`nl2br()`](http://php.net/manual/en/function.nl2br.php). If it works, consider using [`htmlentities()`](https://php.net/manual/en/function.htmlentities.php) too.
    – lorenzo-s Apr 01 '15 at 21:38
  • Yeah, removing the trim and using `echo nl2br($log);` works just fine and
    are preserved. Thanks!
    – Jguy Apr 02 '15 at 17:37
  • Not sure if it's a one off but the links aren't working. I'm getting SSL error on both firefox and chrome. – alimack May 12 '15 at 13:47
  • It would be interesting to see what the answer would be for a 1GB file with ~ 1M lines, it looks like #2 and #4/5 are approaching each other. – Yehosef Jun 10 '15 at 10:11
  • @Yehosef Ok, maybe you can have a 1GB file, but It's quite unusual you need to read a million lines from the end of it :) – lorenzo-s Jun 10 '15 at 10:20
  • ah.. I think I misunderstood what's the number of lines - I thought it was the number of lines of the file - from your comment I gather it means the number of lines from the end. The exact use case I'm working with is that we're downloading log files from a provider that are sometimes truncated due to a problem on their end. To verify the integrity I want to check the last 3 characters. We're currently doing that with system call to "tail -c3" But I came here b/c I was curious if I could efficiently do the same in php without a system call. – Yehosef Jun 10 '15 at 13:29
  • 19
    possibly one of the best SO answers I have ever seen. Options, multiple tests, conclusions. You need a medal. – David Oct 06 '17 at 11:36
  • 1
    @David incidentally, this post is a complete off topic, as it doesn't contain any *answer*. In order to get your answer, you have to go to another site, which is directly prohibited by the rules. – Your Common Sense May 24 '22 at 05:58
  • I get this error when using the function from Solution #5 "crbug/1173575, non-JS module files deprecated." – Max J. May 26 '22 at 12:50
5

This is a modified version which can also skip last lines:

/**
 * Modified version of http://www.geekality.net/2011/05/28/php-tail-tackling-large-files/ and of https://gist.github.com/lorenzos/1711e81a9162320fde20
 * @author Kinga the Witch (Trans-dating.com), Torleif Berger, Lorenzo Stanco
 * @link http://stackoverflow.com/a/15025877/995958
 * @license http://creativecommons.org/licenses/by/3.0/
 */    
function tailWithSkip($filepath, $lines = 1, $skip = 0, $adaptive = true)
{
  // Open file
  $f = @fopen($filepath, "rb");
  if (@flock($f, LOCK_SH) === false) return false;
  if ($f === false) return false;

  if (!$adaptive) $buffer = 4096;
  else {
    // Sets buffer size, according to the number of lines to retrieve.
    // This gives a performance boost when reading a few lines from the file.
    $max=max($lines, $skip);
    $buffer = ($max < 2 ? 64 : ($max < 10 ? 512 : 4096));
  }

  // Jump to last character
  fseek($f, -1, SEEK_END);

  // Read it and adjust line number if necessary
  // (Otherwise the result would be wrong if file doesn't end with a blank line)
  if (fread($f, 1) == "\n") {
    if ($skip > 0) { $skip++; $lines--; }
  } else {
    $lines--;
  }

  // Start reading
  $output = '';
  $chunk = '';
  // While we would like more
  while (ftell($f) > 0 && $lines >= 0) {
    // Figure out how far back we should jump
    $seek = min(ftell($f), $buffer);

    // Do the jump (backwards, relative to where we are)
    fseek($f, -$seek, SEEK_CUR);

    // Read a chunk
    $chunk = fread($f, $seek);

    // Calculate chunk parameters
    $count = substr_count($chunk, "\n");
    $strlen = mb_strlen($chunk, '8bit');

    // Move the file pointer
    fseek($f, -$strlen, SEEK_CUR);

    if ($skip > 0) { // There are some lines to skip
      if ($skip > $count) { $skip -= $count; $chunk=''; } // Chunk contains less new line symbols than
      else {
        $pos = 0;

        while ($skip > 0) {
          if ($pos > 0) $offset = $pos - $strlen - 1; // Calculate the offset - NEGATIVE position of last new line symbol
          else $offset=0; // First search (without offset)

          $pos = strrpos($chunk, "\n", $offset); // Search for last (including offset) new line symbol

          if ($pos !== false) $skip--; // Found new line symbol - skip the line
          else break; // "else break;" - Protection against infinite loop (just in case)
        }
        $chunk=substr($chunk, 0, $pos); // Truncated chunk
        $count=substr_count($chunk, "\n"); // Count new line symbols in truncated chunk
      }
    }

    if (strlen($chunk) > 0) {
      // Add chunk to the output
      $output = $chunk . $output;
      // Decrease our line counter
      $lines -= $count;
    }
  }

  // While we have too many lines
  // (Because of buffer size we might have read too many)
  while ($lines++ < 0) {
    // Find first newline and remove all text before that
    $output = substr($output, strpos($output, "\n") + 1);
  }

  // Close file and return
  @flock($f, LOCK_UN);
  fclose($f);
  return trim($output);
}
Kinga the Witch
  • 129
  • 1
  • 5
3

This would also work:

$file = new SplFileObject("/path/to/file");
$file->seek(PHP_INT_MAX); // cheap trick to seek to EoF
$total_lines = $file->key(); // last line number

// output the last twenty lines
$reader = new LimitIterator($file, $total_lines - 20);
foreach ($reader as $line) {
    echo $line; // includes newlines
}

Or without the LimitIterator:

$file = new SplFileObject($filepath);
$file->seek(PHP_INT_MAX);
$total_lines = $file->key();
$file->seek($total_lines - 20);
while (!$file->eof()) {
    echo $file->current();
    $file->next();
}

Unfortunately, your testcase segfaults on my machine, so I cannot tell how it performs.

lorenzo-s
  • 16,603
  • 15
  • 54
  • 86
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • 1
    I didn't know about the `SplFileObject` class, thank you. Don't know why the test sigfaults on your machine, anyway I run it alongside the better method (#5) for the 10MB file, and the performance is not quite good, it's comparable to the shell method (#2). See **[here](http://i.imgur.com/cWdYrzE.png)**. – lorenzo-s Jan 13 '17 at 16:50
  • Note that first solution with `LimitIterator` will throw `OutOfRangeException` if you have less that 20 lines in your file, stating `Parameter offset must be >= 0`. The second one will throw `LogicException` for, basically, same reason. – Georgy Ivanov May 05 '17 at 19:25
  • 1
    Most people finding this solution good do not realize that there is no way to implement cheap seek() function, which makes this code inevitably read **every single line** from the file, making it O(n) solution as opposed to O(1) for the fseek-based solutions. Therefore it shouldn't be recommended for the large files. – Your Common Sense May 23 '22 at 18:36
3

My little copy paste solution after reading all this here.

/**
 * @param $pathname
 * @param $lines
 * @param bool $echo
 * @return int
 */
private function tailonce($pathname, $lines, $echo = true)
{
    $realpath = realpath($pathname);
    $fp = fopen($realpath, 'r', FALSE);
    $flines = 0;
    $a = -1;
    while ($flines <= $lines) {
        fseek($fp, $a--, SEEK_END);
        $char = fread($fp, 1);
        if ($char == "\n") $flines++;
    }
    $out = fread($fp, 1000000);
    fclose($fp);
    if ($echo) echo $out;
    return $a+2;
}

A continuous tail function as in tail -f
It does not close $fp cause you must kill it with Ctrl-C anyway. usleep for saving your cpu time, only tested on windows so far.

/**
 * @param $pathname
 */
private function tail($pathname)
{
    $realpath = realpath($pathname);
    $fp = fopen($realpath, 'r', FALSE);
    $lastline = '';
    fseek($fp, $this->tailonce($pathname, 1, false), SEEK_END);
    do {
        $line = fread($fp, 1000);
        if ($line == $lastline) {
            usleep(50);
        } else {
            $lastline = $line;
            echo $lastline;
        }
    } while ($fp);
}

You need to put this code into a class!

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
user163193
  • 41
  • 3
  • Very good ideas in general but many annoying little flaws. Such as reading by one char, assuming that 1000000 will be enough, useless $echo variable – Your Common Sense May 24 '22 at 09:15
0

Yet another function, you can use regexes to separate items. Usage

$last_rows_array = file_get_tail('logfile.log', 100, array(
  'regex'     => true,          // use regex
  'separator' => '#\n{2,}#',   //  separator: at least two newlines
  'typical_item_size' => 200, //   line length
));

The function:

// public domain
function file_get_tail( $file, $requested_num = 100, $args = array() ){
  // default arg values
  $regex         = true;
  $separator     = null;
  $typical_item_size = 100; // estimated size
  $more_size_mul = 1.01; // +1%
  $max_more_size = 4000;
  extract( $args );
  if( $separator === null )  $separator = $regex ? '#\n+#' : "\n";

  if( is_string( $file ))  $f = fopen( $file, 'rb');
  else if( is_resource( $file ) && in_array( get_resource_type( $file ), array('file', 'stream'), true ))
    $f = $file;
  else throw new \Exception( __METHOD__.': file must be either filename or a file or stream resource');

  // get file size
  fseek( $f, 0, SEEK_END );
  $fsize = ftell( $f );
  $fpos = $fsize;
  $bytes_read = 0;

  $all_items = array(); // array of array
  $all_item_num = 0;
  $remaining_num = $requested_num;
  $last_junk = '';

  while( true ){
    // calc size and position of next chunk to read
    $size = $remaining_num * $typical_item_size - strlen( $last_junk );
    // reading a bit more can't hurt
    $size += (int)min( $size * $more_size_mul, $max_more_size );
    if( $size < 1 )  $size = 1;

    // set and fix read position
    $fpos = $fpos - $size;
    if( $fpos < 0 ){
      $size -= -$fpos;
      $fpos = 0;
    }

    // read chunk + add junk from prev iteration
    fseek( $f, $fpos, SEEK_SET );
    $chunk = fread( $f, $size );
    if( strlen( $chunk ) !== $size )  throw new \Exception( __METHOD__.": read error?");
    $bytes_read += strlen( $chunk );
    $chunk .= $last_junk;

    // chunk -> items, with at least one element
    $items = $regex ? preg_split( $separator, $chunk ) : explode( $separator, $chunk );

    // first item is probably cut in half, use it in next iteration ("junk") instead
    // also skip very first '' item
    if( $fpos > 0 || $items[0] === ''){
      $last_junk = $items[0];
      unset( $items[0] );
    } // … else noop, because this is the last iteration

    // ignore last empty item. end( empty [] ) === false
    if( end( $items ) === '')  array_pop( $items );

    // if we got items, push them
    $num = count( $items );
    if( $num > 0 ){
      $remaining_num -= $num;
      // if we read too much, use only needed items
      if( $remaining_num < 0 )  $items = array_slice( $items, - $remaining_num );
      // don't fix $remaining_num, we will exit anyway

      $all_items[] = array_reverse( $items );
      $all_item_num += $num;
    }

    // are we ready?
    if( $fpos === 0 || $remaining_num <= 0 )  break;

    // calculate a better estimate
    if( $all_item_num > 0 )  $typical_item_size = (int)max( 1, round( $bytes_read / $all_item_num ));
  }

  fclose( $f ); 

  //tr( $all_items );
  return call_user_func_array('array_merge', $all_items );
}
biziclop
  • 14,466
  • 3
  • 49
  • 65
0

I like the following method, but it won't work on files up to 2GB.

<?php
    function lastLines($file, $lines) {
        $size = filesize($file);
        $fd=fopen($file, 'r+');
        $pos = $size;
        $n=0;
        while ( $n < $lines+1 && $pos > 0) {
            fseek($fd, $pos);
            $a = fread($fd, 1);
            if ($a === "\n") {
                ++$n;
            };
            $pos--;
        }
        $ret = array();
        for ($i=0; $i<$lines; $i++) {
            array_push($ret, fgets($fd));
        }
        return $ret;
    }
    print_r(lastLines('hola.php', 4));
?>
sergiotarxz
  • 520
  • 2
  • 14