0

I've read here and cannot really understand how to speed up my simple exec() which basically looks like this:

 zcat access_log.201312011745.gz | grep 'id=6' | grep 'id2=10' | head -n10 

I've added ini_set('memory_limit', 256); to the top of the PHP document, but the script still takes about 1 minute to run (contrasted with about near instant completion in Penguinet). What can I do to improve it?

Community
  • 1
  • 1
1252748
  • 14,597
  • 32
  • 109
  • 229
  • 1
    How big is your file? Note that doing `zcat` and then piping, a lot of memory is used to allocate the file. – fedorqui Dec 04 '13 at 15:58
  • @fedorqui file is 11 megabytes. How would you recommend searching it? – 1252748 Dec 04 '13 at 19:08
  • php's memory limit does NOT apply to external programs you're running via `exec()`. maybe it does take long to find 10 lines that have `id2=10` buried within all the output of the lines that contain `id=6` amongst ALL of the lines in that log file. – Marc B Dec 04 '13 at 19:31
  • @MarcB Why does it take so little time to do the same search from the command line then? How can I replicate this speed? – 1252748 Dec 04 '13 at 19:34
  • 1
    How about unzipping the file beforehand, and then just using "grep 'id=6' file.notzipped | grep..." That will take "zcat" out of the equation altogether and may make it easier to solve. – Mark Setchell Dec 04 '13 at 20:06
  • @MarkSetchell How would that be faster? Can you show me an example of that command? Thanks very much! – 1252748 Dec 04 '13 at 20:32
  • You can unzip the file beforehand by typing "zcat access.log.201312011745.gz > fred". Then "fred" will be unzipped. Then you can exec "cat fred" instead of "zcat access.log.201312011745.gz" which will allow us to see if it is the memory required by "zcat" that is causing the problem. – Mark Setchell Dec 04 '13 at 20:42

1 Answers1

0

I would try some of the following:

Change your exec to just run somethig simple, like

echo Hello

and see if it still takes so long - if it does, the problem is in the process creation and exec()ing area.

If that runs quickly, try changing the exec to something like:

zcat access_log.201312011745.gz > /dev/null

to see if it is the "zcat" slowing you down

Think about replacing the greps with a "sed" that quits (using "q") as soon as it finds what you are looking for rather than continuing all the way to end of file - since it seems (by your "head") you are only interested in the first few, not all occurrences of your strings. For example, you seem to be looking for lines that contain "id=6" and also "id2=10", so if you used "sed" like below, it may be faster because "sed" will print it and stop immediately the moment it finds a line with "id=6" followed by "id2=10"

zcat access_log.201312011745.gz | sed -n '/id=2.*id2=10/p;q'

The "-n" says "don't print, in general" and then it looks for "id=2" followed by any characters then "id2=10". If it finds that, it prints the line and the "q" makes it quit immediately without looking through to end of file. Note that I am assuming "id=2" comes before "id2=10" on the line. If that is not true, the "sed" will need additional work.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • great. thank you. can i just word for word replace grep with sed? – 1252748 Dec 04 '13 at 18:53
  • also, i don't understand how to write this exactly `zcat access_log.... > /dev/null` in order to test. – 1252748 Dec 04 '13 at 19:03
  • I have edited my original post to clarify what I was trying to say. – Mark Setchell Dec 04 '13 at 19:40
  • Thanks! But where in this command do I specify the file name? Also, why do I want to tell it "not to print". I want to print the results. If I want to print ten results, how do I specify that? Thanks again – 1252748 Dec 04 '13 at 20:15
  • I have edited the command to show how to use it. The "p" at the end says "print only if the pattern is matched". – Mark Setchell Dec 04 '13 at 20:18