5

Let's suppose I pipe the output of a command and want to filter lines with grep but also keep the first one which is a header. I saw someone type something akin to this:

the command | (read l; echo$l) | grep bla | less

and it extracted the first line (header), then grepped the rest of the file for the lines matching bla and the output of that went to less for inspection. Of course the above command doesn't work but that's the idea, what part of it is wrong?

Palace Chan
  • 8,845
  • 11
  • 41
  • 93

4 Answers4

6

With awk:

command | awk 'NR==1||/bla/'

Thanks to @doubleDown for pointing out that {print} is unnecessary since it is the default action.

With perl:

command | perl -ne 'print if $.==1 or /bla/'

(If you need perl irregular expressions, perl is probably available :) )

rici
  • 234,347
  • 28
  • 237
  • 341
  • 2
    You don't need the `{print}` as it is the default action. – doubleDown Jun 14 '13 at 17:25
  • 1
    The problem with this approach is that it doesn't satisfy the asker's "with grep" requirement. Without `grep`, you can't pass useful `grep` command-line arguments like `-v` and `-F` -- you have to modify the `awk` or `perl` code directly to get the desired behavior. – Richard Hansen Jun 14 '13 at 21:39
  • @RichardHansen: Fair point, although in those examples the modification is pretty simple. `-f` would have been much more troublesome. However, it still might be a useful technique. – rici Jun 15 '13 at 06:57
2

sed flavor:

command | sed -ne '1p' -ne '/bla/p'
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
1

How to grep everything but the first line

The following will mostly get what you want, but it has several flaws (see the end of this post):

the command | (read l; echo $l; grep blah) | less

Instead, I recommend creating and using the following function:

grep1 () (
    IFS= read -r line
    printf %s\\n "${line}"
    grep "$@"
)

Here is how you would use it:

the command | grep1 blah | less

Example of it in action:

$ ps -ef | grep1 firefox
UID        PID  PPID  C STIME TTY          TIME CMD
rhansen   3654  3311  4 13:33 ?        00:07:59 /usr/lib/firefox/firefox

How it works

  1. read consumes the first line of input from command and assigns it (unmodified) to the variable line
  2. printf outputs the value of line (unmodified)
  3. the remaining input lines are consumed, filtered, and output by grep

The first line never passes through grep, so there's no opportunity for it to filter it out.

Notes

  • I enclosed the function body in ( ... ) instead of { ... } because I don't want variable assignments inside the body to affect the caller's environment (the parentheses cause it to be run in a subshell, which isolates any changes from the caller).
  • IFS= prevents read from stripping leading and trailing whitespace
  • the -r argument to read prevents it from processing backslashes (the first line is perfectly preserved in the variable line)
  • I use printf %s\\n instead of echo because echo might process backslashes, possibly causing the first line of output to be different from the original first line

Improvements

The above function has a minor problem: If given empty input it will print a blank line. The following avoids that problem:

grep1_better() (
    IFS= read -r line && printf %s\\n "${line}"
    grep "$@"
)

This works because read returns a non-zero return code if it encounters the end of input. If there's no input, read will "fail" (return non-zero) and the && will skip the printf.

But, now there's a new problem: If there is input, but there aren't any newlines at all (for example, printf %s foo), the function will output nothing. This is because read will encounter the end of input and "fail" even though there was some input. Here's how that can be fixed:

grep1_even_better() (
    IFS= read -r line || [ -n "${line}" ] && printf %s\\n "${line}"
    grep "$@"
)

In English, the above says, "Read a line of input. If the end of input wasn't encountered, or if something was read, then print what was read. Then run grep."

A further improvement would be to detect when the function is being called with one or more filename arguments and react accordingly (read from the file(s) instead of standard input).

What's wrong with this example?

The following code doesn't work:

the command | (read l; echo $l) | grep bla | less

There are two major problems:

  • The first line is still piped through grep, so grep could still filter it out.
  • The remaining lines of input are discarded by the second stage of the pipeline. (More precisely, the "the command" command never gets an opportunity to output the remaining lines (modulo buffering) because nobody in the second stage is waiting to read them.)

In addition, there are a handful of minor problems:

  • Because IFS is not set to the empty string before calling read, read will strip the first line's leading and trailing whitespace before assigning the variable l.
  • Because -r is not passed to read, read will attempt to interpret backslashes in the first input line. This could corrupt the first line.
  • Because the argument to echo is not enclosed in double quotes, tabs and multiple consecutive whitespace will be converted to a single space. If the first line contains column headings, this will break the alignment with the following rows.
  • Because echo might process backslashes in its arguments, the first line may be corrupted.
  • If the first line begins with -, echo might interpret the string as an option, not something to be printed.
  • It'll print a blank line if given empty input.

These minor problems are also present in the command | (read l; echo $l; grep blah) | less, which is why I recommended the grep1() function.

Community
  • 1
  • 1
Richard Hansen
  • 51,690
  • 20
  • 90
  • 97
0

The awk and sed answers above are usually the way to go. Sometimes when the regex in question is complex and grep is the only option, the following tee-based option should work. Here tee writes its input to two "files", where grep and head consume the input via process substitution. tee also writes its input to standard output, which in this case needs to be redirected to /dev/null. The sleep is needed to ensure head returns its output before grep

command | tee  >(sleep 1; grep regex) >(head -1) >/dev/null
iruvar
  • 22,736
  • 7
  • 53
  • 82
  • Just out of curiosity, what regex would be too complex for awk? awk recognizes all POSIX extended regexes (i.e. the same as `egrep`) – rici Jun 14 '13 at 18:37
  • @rici, GNU `grep` with `-P` supports PCRE goodies such as look-ahead assertions, do not believe `awk` does this – iruvar Jun 14 '13 at 18:40
  • OK, fair enough. Added a perl example. – rici Jun 14 '13 at 19:57