2

I have huge log files, and I am trying to "filter" them according to their line prefixes. Using grep is really fast, but not fast enough; typical results:

$ time grep "E ::" app.log

real    0m11.159s
user    0m10.081s
sys     0m1.040s

I thought I might save grep some effort if I'll tell it that the prefix E :: is actually a prefix, that is, it appears in the beginning of the line. I believed that this will let grep skip looking for it along the long lines in my log file. However, as it seems, it doesn't do much:

$ time grep "^E ::" app.log

real    0m11.152s
user    0m10.229s
sys     0m0.884s

Grepping ^E is about 15% faster.

Do you have any idea why? Can you think of a faster way to filter these 9GB log files according to the first char in each line?

Bach
  • 6,145
  • 7
  • 36
  • 61

3 Answers3

1

You can try GNU parallel, e.g.

cat app.log | parallel --pipe grep '^E ::'

See the link for different examples on how to tweak this (how many jobs to run, into how big chunks you want the input file to be split etc.).

Adrian Frühwirth
  • 42,970
  • 10
  • 60
  • 71
1

Try this:

LC_ALL=C fgrep "E ::" app.log
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Doesn't change much. However, if I try it with `^E`, it becomes 50% slower. – Bach Mar 11 '14 at 11:06
  • 1
    I had to check it, so for other people reading: [What does “LC_ALL=C” do?](http://unix.stackexchange.com/q/87745/40596) – fedorqui Mar 11 '14 at 12:16
  • @fedorqui +1 for your community spirit. I should have explained it myself. It disables NLS so that `grep` can make assumptions about the type of data that it is looking at (e.g. ASCII, single byte etc) and thereby hopefully go faster. – Mark Setchell Mar 11 '14 at 12:34
-2

try this

[honeypot]# (time ls) 1> /dev/null 2> output
[honeypot]# cat output

real    0m0.020s
user    0m0.001s
sys    0m0.006s
Adrian
  • 5
  • 7