Grep in reverse order without reading whole file

Question

I have a log file that may be very large (10+ GB). I'd like to find the last occurrence of an expression. Is it possible to do this with standard posix commands?

Here are some potential answers, from similar questions, that aren't quite suitable.

Use tail -n <x> <file> | grep -m 1 <expression>: I don't know how far back the expression is, so I don't know what <x> would be. It could be several GB previous, so then you'd be tailing the entire file. I suppose you could loop and increment <x> until it's found, but then you'd be repeatedly reading the last part of the file.
Use tac <file> | grep -m 1 <expression>: tac reads the entire source file. It might be possible to chain something on to sigkill tac as soon as some output is found? Would that be efficient?
Use awk/sed: I'm fairly sure these both always start from the top of the file (although I may be wrong, my sed-fu is not strong).
"There'd be no speed up so why bother": I think that's incorrect, since file systems can seek to the end of a file without reading the whole thing. There'd be a little trial and error/buffering to find each new line, but that shouldn't slow things down much, compared to reading (e.g.) 10 GB that are never used.
Write a python/perl script to do it: this is my fall-back if no one can suggest anything better. I'd rather stick to something that can be done straight through the command line, since I'm executing it straight through ssh, and I'd rather not have to upload a script file as well. Using mmap's rfind() in python, I think we can do it in a few lines, provided the expression to find is static (which mine, unfortunately, is not). A regex requires a bit more work, something like this.

If it helps, the expression is anchored at the start of a line, eg: "^foo \d+$".

Exact question answered here. http://unix.stackexchange.com/questions/112159/grep-from-the-end-of-a-file-to-the-beginning — rango, Jul 17 '16 at 16:25

score 4 · Accepted Answer · answered Jul 17 '16 at 15:46

4

Whatever script you write will almost certainly be slower than:

tac file | grep -m 1 '^foo [0-9][0-9]*$'

answered Jul 17 '16 at 15:46

Ed Morton

188,023
17
78
185

3

Okay, it looks like when grep finishes, the pipe is broken, and the kernel sends SIGPIPE to tac, which responds by closing the input file and then exiting with code 1. So that's why it doesn't read the whole file. Therefore it looks like this solution is the simplest and fastest, and doesn't read the whole file (as I feared that it might). – Amanda Ellaway Jul 18 '16 at 02:11

Kusalananda · Answer 2 · 2016-07-17T16:22:40.847

0

This awk script will search through the whole file and print the last line matching the given /pattern/:

$ awk '/pattern/ { line=$0 } END { print $line }' gigantic.log

Using tac will be a better option (this uses GNU sed to output the first (i.e. last) found match with '/pattern/', after which it terminates, killing the pipeline):

$ tac gigantic.log | gsed -n '/pattern/{p;q}'

Using Perl or C or some other language, you could seek to the end of the file, step back 4kb (or something), and then

read forwards 4kb,
step back 8kb
repeat until pattern is found, making sure that handle reading partial lines correctly.

(This, apart from looking for a pattern, may actually be what tac does: one implementation of tac)

edited Jul 17 '16 at 16:22

answered Jul 17 '16 at 15:58

Kusalananda

14,885
3
41
52

Ah, "after which it terminates, killing the pipeline", I wasn't sure if this was what actually happened. I did go as far as looking at the tac source myself, and couldn't see any provision to terminate early. Does the pipeline being lost cause a HUP signal or something? Note that `sleep 5 | echo x | grep x -m 1` still takes 5 seconds. – Amanda Ellaway Jul 17 '16 at 20:51
1

Actually, `cat /dev/urandom | grep x -m 1 --binary-type=text` doesn't hang, so that's pretty convincing evidence that the pipeline can terminate things like that. I guess `sleep` may not open `stdout` at all, which might account for the difference. – Amanda Ellaway Jul 17 '16 at 23:29
Okay, I accepted the other answer (see the comment), mainly because I don't see any advantage for gsed over grep (I suspect the speed difference is essentially nil, but grep is more specialised to the task), and your other suggestion definitely reads the whole file. – Amanda Ellaway Jul 18 '16 at 02:19

Grep in reverse order without reading whole file

2 Answers2