0

I've been using grep and sed on some logcat output to make it more readable and I noticed my output was noticeably slower than just grep-ing the output.

I understand sed is obviously going to add more runtime, but I wanted to check for any optimization techniques.

My commands look something like this for reference:

adb logcat | grep arg | sed $'s/{/\\\n{/g

  • can you define 'noticeably slower'? eg, 'a 5GB file takes 5 secs with grep and 10 secs with grep+sed'? what happens if you pull the `arg* match into the sed via a range/match? or replace grep+sed with a single awk call to do the the same thing? – markp-fuso Jul 08 '21 at 17:51
  • @markp-fuso logcat gives continuous output. Without sed it gives a new line about ever .1 sec, whereas with sed it's about every 2 sec – David Wright Jul 08 '21 at 17:54
  • 2
    I don't have `adb` running locally so can't comment on the `sed` performance but would be curious how `adb logcat | awk '/arg/ { gsub("{","\n{"); print}'` performs – markp-fuso Jul 08 '21 at 18:12
  • @markp-fuso That runs leagues faster. Thank you – David Wright Jul 08 '21 at 18:55
  • `sed` slowing things down so much seems unbelievable. When you get output every 2 seconds, does that mean you get just a single line every 2 seconds, or do you get a big block of lines after every pause? For the latter case, this would only be a buffering issue but not an actual speed issue. Here you could try `grep --line-buffered ... | sed --unbuffered ...`. – Socowi Jul 08 '21 at 19:37

1 Answers1

0

The useless grep is well-documented and easy to get rid of.

adb logcat | sed $'/\\*arg/s/{/\\\n{/g'

To briefly reiterate the linked web page, anything that looks like grep 'x' | sed 'y' can be refactored to sed '/x/y' (and similarly for grep 'x' | awk 'y', which reduces to awk '/x/ y'). sed and Awk are both generalized regex tools which can do everything grep can do (though in fairness some complex grep options are tedious to reimplement in a sed or Awk script; but this is obviously not one of these cases).

However, *arg* is not a well-defined regex; so I have to guess what you actually mean.

  • * at the beginning of a regex isn't well-defined; but many grep implementations will understand it to mean a literal asterisk. If that's not what you meant, probably take away the first \\*.
  • arg* is exactly equivalent to ar; if you don't care whether there are g characters after the match, just don't specify them. But perhaps you actually meant arg followed by anything?
  • But then I guess you probably meant just arg (implicitly preceded by and followed by anything).

In case it's not already obvious, * is not a wildcard character in regex. Instead, it says to repeat the preceding expression as many times as possible, zero or more (and thus the way to say "any string at all" in regex is .*, i.e. the "any character (except newline)" wildcard character . repeated zero or more times).

Also, grep (and sed, and Awk) look for the regex anywhere in a line (unless you put in explicit regex anchors or use grep -x or equivalent options in sed or Awk) so you don't need to specify "preceded by anything" or "followed by anything".

The Bash "C-style string" $'...' offers some conveniences, but also requires any literal backslash to be doubled. So $'/\\*/' is equivalent to '/\*/' in regular single quotes.

The reason the sed slows you down is probably buffering, but getting rid of the useless grep also coincidentally gets rid of that buffering.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I apologize for the poor example. `*arg*` was supposed to be a placeholder since I can't really disclose my the actual arg I'm passing to grep (NDA stuff). I should've realized using * was a bad idea in a regex type problem – David Wright Jul 08 '21 at 18:29
  • I'll probably leave those things in now that I wrote them. Does this answer otherwise solve your problem? – tripleee Jul 08 '21 at 18:33
  • I tried `adb logcat | sed $'/arg/s/{/\\\n{/g'` but it's not filtering the logs that my original `grep` got rid of – David Wright Jul 08 '21 at 18:50
  • 1
    If your `arg` was a shell variable, that changes the question. If your `arg` was a regex which has different meanings to `grep` and `sed`, that could explain it. Your question doesn't suggest anything remotely like either of these so I have answered the question you actually asked. But if you can edit your question to provide a [mre] where this approach doesn't work, I can try to update this answer. – tripleee Jul 09 '21 at 06:50
  • ... Or maybe just accept this answer, or provide one of your own and accept that, and ask a new question with your *actual* requirements. Moving the goalposts after you have received answers is always more or less problematic. – tripleee Jul 09 '21 at 06:50