0

I have a huge log file ( around 1,000,000 lines ). I would like to obtain the last line and remove it from the file using PHP. What is the quickest way to do so?

I tried:

$logfile = escapeshellarg("/path/to/logfile");
$lastline = `tail -n 1 "$logfile"`; // obtained the last line

Is the above approach efficient enough? and how to remove the last line from the file?

From Jon's answer below, here are the codes :

$buffer_size = 1000;
$fh = fopen("/path/to/logfile", "r+");
fseek($fh, -$buffer_size, SEEK_END);
$content = fgets($fh, 100);
while(strrpos($content, PHP_EOL) != false) {
  fseek($fh, -$buffer_size); // move backward for extra -1000
  $content = fgets($fh, $buffer_size);
}
$pos_last_eol = strrpos($content, PHP_EOL);
fseek($fh, $pos_last_eol); // seek to that position
ftruncate($fh, ftell($fh));
fclose($fh);
Raptor
  • 53,206
  • 45
  • 230
  • 366
  • 2
    I believe it is the right approach to use shell for this, just make sure to escape input to avoid command line injection – mkk Aug 29 '12 at 10:00
  • agree. I added `escapeshellarg()`. but how to remove the last line from the file efficiently? – Raptor Aug 29 '12 at 10:04
  • out of curiosity: why do you need to do that in PHP? why cant you just do it in the shell? – Gordon Aug 29 '12 at 10:21
  • we need to process the logs in web page. Of course, PHP can call shell script. – Raptor Aug 29 '12 at 10:24

1 Answers1

2

The fastest way to obtain and remove the last line from a big file is:

  1. Open the file for writing
  2. Seek to the end
  3. Seek some arbitrary buffer length backwards (let's say 1K) and read data to fill the buffer
  4. Search the buffer backwards with something like strrpos until you find an end-of-line marker¹
  5. If you do not find an EOL, go to step 3 and repeat
  6. If you do find an EOL, you know the file offset at which it occurs based on the position in the buffer and the offset at which the buffer was read from
  7. Obtain the last line by seeking to that offset and reading until end of file²
  8. Call ftruncate to cut off the part of the file beginning with the end of line found

¹ Supporting all of \n, \r, \r\n is going to complicate things a little; especially for the latter, it could always happen to span across two buffers so you 'd have to explicitly watch out for that.

² This is not strictly necessary because all the data you are going to read has already passed through the buffer, so you could have kept a copy and saved the cost of this operation. In practice though the last line is not going to be too long so it's more convenient to just re-read the whole thing (C runtime and/or OS filesystem cache will probably make this stupidly fast anyway).

This is what any program would have to do. If you decide to "cheat" by offloading the first seven steps to an external utility like tail you can remove the line from the file with one call to ftruncate, but: be careful when calculating the offset at which to truncate if you do not wish to leave trailing end-of-line character(s) in the file.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • just to ask in another way round, is it easier to "pop" the first line from the file? – Raptor Aug 29 '12 at 10:11
  • 1
    @ShivanRaptor: Popping will be horribly slow because you would have to read all the data that's going to "stay" and re-write it starting from offset 0. All of it. – Jon Aug 29 '12 at 10:12
  • I write the codes with your answer into my question. Can you see if the codes work ? – Raptor Aug 29 '12 at 10:22
  • 1
    @ShivanRaptor: No, there are lots of mistakes in there. Seeking should be done immediately before reading; you should read exactly as much as you seek (unless the file is smaller than one buffer); `strrpos` instead of `strpos`; these functions do not return `-1` but `false` on failing to find the needle; you call `ftell` with bad arguments. Start fixing :) – Jon Aug 29 '12 at 10:38
  • 1
    @ShivanRaptor: I 'm sorry, but it's not possible to keep debugging code like this. Try it yourself, and ask another question if something specific gives you trouble. – Jon Aug 29 '12 at 10:52
  • alright, I will try it out myself ( I was just trying to finding out the most efficient codes ... ) – Raptor Aug 29 '12 at 10:53