27

When I do

$ ps -ef | grep cron

I get

root      1036     1  0 Jul28 ?        00:00:00 cron
abc    21025 14334  0 19:15 pts/2    00:00:00 grep --color=auto cron

My question is why do I see the second line. From my understanding, ps lists the processes and pipes the list to grep. grep hasn't even started running while ps is listing processes, then how come grep process is listed in the o/p ?

Related second question:

When I do

$ ps -ef | grep [c]ron

I get only

root      1036     1  0 Jul28 ?        00:00:00 cron

What is the difference between first and second grep executions?

Ankur Agarwal
  • 23,692
  • 41
  • 137
  • 208

7 Answers7

41

When you execute the command:

ps -ef | grep cron

the shell you are using

(...I assume bash in your case, due to the color attribute of grep I think you are running a gnu system like a linux distribution, but it's the same on other unix/shell as well...)

will execute the pipe() call to create a FIFO, then it will fork() (make a running copy of itself). This will create a new child process. This new generated child process will close() its standard output file descriptor (fd 1) and attach the fd 1 to the write side of the pipe created by the father process (the shell where you executed the command). This is possible because the fork() syscall will maintain, for each, a valid open file descriptor (the pipe fd in this case). After doing so it will exec() the first (in your case) ps command found in your PATH environment variable. With the exec() call the process will become the command you executed.

So, you now have the shell process with a child that is, in your case, the ps command with -ef attributes.

At this point, the parent (the shell) fork()s again. This newly generated child process close()s its standard input file descriptor (fd 0) and attaches the fd 0 to the read side of the pipe created by the father process (the shell where you executed the command).

After doing so it will exec() the first (in your case) grep command found in your PATH environment variable.

Now you have the shell process with two children (that are siblings) where the first one is the ps command with -ef attributes and the second one is the grep command with the cron attribute. The read side of the pipe is attached to the STDIN of the grep command and the write side is attached to the STDOUT of the ps command: the standard output of the ps command is attached to the standard input of the grep command.

Since ps is written to send on the standard output info on each running process, while grep is written to get on its standard input something that has to match a given pattern, you'll have the answer to your first question:

  1. the shell runs: ps -ef;
  2. the shell runs: grep cron;
  3. ps sends data (that even contains the string "grep cron") to grep
  4. grep matches its search pattern from the STDIN and it matches the string "grep cron" because of the "cron" attribute you passed in to grep: you are instructing grep to match the "cron" string and it does because "grep cron" is a string returned by ps at the time grep has started its execution.

When you execute:

ps -ef | grep '[c]ron'

the attribute passed instructs grep to match something containing "c" followed by "ron". Like the first example, but in this case it will break the match string returned by ps because:

  1. the shell runs: ps -ef;
  2. the shell runs: grep [c]ron;
  3. ps sends data (that even contains the string grep [c]ron) to grep
  4. grep does not match its search pattern from the stdin because a string containing "c" followed by "ron" it's not found, but it has found a string containing "c" followed by "]ron"

GNU grep does not have any string matching limit, and on some platforms (I think Solaris, HPUX, aix) the limit of the string is given by the "$COLUMN" variable or by the terminal's screen width.

Hopefully this long response clarifies the shell pipe process a bit.

TIP:

ps -ef | grep cron | grep -v grep
haccks
  • 104,019
  • 25
  • 176
  • 264
dAm2K
  • 9,923
  • 5
  • 44
  • 47
  • Thanks for elaborating upon @Ben Jackson's answer. – Ankur Agarwal Mar 17 '12 at 05:55
  • 1
    I think running this will be a good illustration to this wonderful answer: `$ ps aux | grep grep | grep grep | grep grep | grep grep` . You will see four lines of grep grep – all the ones in the pipe You created. – Esmu Igors Jul 28 '20 at 10:57
9

The shell constructs your pipeline with a series of fork(), pipe() and exec() calls. Depending on the shell any part of it may be constructed first. So grep may already be running before ps even starts. Or, even if ps starts first it will be writing into a 4k kernel pipe buffer and will eventually block (while printing a line of process output) until grep starts up and begins consuming the data in the pipe. In the latter case if ps is able to start and finish before grep even starts you may not see the grep cron in the output. You may have noticed this non-determinism at play already.

Ben Jackson
  • 90,079
  • 9
  • 98
  • 150
8

In your command

ps -ef | grep 'cron'

Linux is executing the "grep" command before the ps -ef command. Linux then maps the standard output (STDOUT) of "ps -ef" to the standard input (STDIN) of the grep command.

It does not execute the ps command, store the result in memory, and them pass it to grep. Think about that, why would it? Imagine if you were piping a hundred gigabytes of data?

Edit In regards to your second question:

In grep (and most regular expression engines), you can specify brackets to let it know that you'll accept ANY character in the brackets. So writing [c] means it will accept any charcter, but only c is specified. Similarly, you could do any other combination of characters.

ps aux | grep cron
root      1079  0.0  0.0  18976  1032 ?        Ss   Mar08   0:00 cron
root     23744  0.0  0.0  14564   900 pts/0    S+   21:13   0:00 grep --color=auto cron

^ That matches itself, because your own command contains "cron"

ps aux | grep [c]ron
root      1079  0.0  0.0  18976  1032 ?        Ss   Mar08   0:00 cron

That matches cron, because cron contains a c, and then "ron". It does not match your request though, because your request is [c]ron

You can put whatever you want in the brackets, as long as it contains the c:

ps aux | grep [cbcdefadq]ron
root      1079  0.0  0.0  18976  1032 ?        Ss   Mar08   0:00 cron

If you remove the C, it won't match though, because "cron", starts with a c:

ps aux | grep [abedf]ron

^ Has no results

Edit 2

To reiterate the point, you can do all sorts of crazy stuff with grep. There's no significance in picking the first character to do this with.

ps aux | grep [c][ro][ro][n]
root      1079  0.0  0.0  18976  1032 ?        Ss   Mar08   0:00 cron
GoldenNewby
  • 4,382
  • 8
  • 33
  • 44
  • I just added one more part to the question. It occurred to me that it was hard to read the "bounty notes". Thanks. – Ankur Agarwal Mar 13 '12 at 02:23
  • Ben Jackson (below) seems to suggest that ps could be running before grep and writing data to a kernel pipe. – Ankur Agarwal Mar 13 '12 at 02:27
  • As far as I'm aware which one starts first is rather irrelevant. The operating system doesn't necessarily allocate any CPU time to either of them until the STDOUT of PS is mapped to the STDIN of GREP. – GoldenNewby Mar 13 '12 at 03:11
  • You want quotes. `c[ro][ro][n]` will be changed to `cron` before `grep` starts if you run the command in `/bin` or any other directory with a file named `cron` (or `corn`, or any other match) present. By contrast, `'c[ro][ro][n]'` won't be expanded. And it gets even messier if you run your original unquoted command in a shell with `nullglob` or `failglob` options enabled. – Charles Duffy May 09 '23 at 15:05
3

You wrote: "From my understanding, ps lists the processes and pipes the list to grep. grep hasn't even started running while ps is listing processes".

Your understanding is incorrect.

That is not how a pipeline works. The shell does not run the first command to completion, remember the output of the first command, and then afterwards run the next command using that data as input. No. Instead, both processes execute and their inputs/outputs are connected. As Ben Jackson wrote, there is nothing to particularly guarantee that the processes run at the same time, if they are both very short-lived, and if the kernel can comfortably manage the small amount of data passing through the connection. In that case, it really could happen the way you expect, only by chance. But the conceptual model to keep in mind is that they run in parallel.

If you want official sources, how about the bash man page:

  A pipeline is a sequence of one or more commands separated by the character |.  The format for a pipeline is:

         [time [-p]] [ ! ] command [ | command2 ... ]

  The  standard  output  of command is connected via a pipe to the standard input of command2.  This connection is
  performed before any redirections specified by the command (see REDIRECTION below).

  ...

  Each command in a pipeline is executed as a separate process (i.e., in a subshell).

As for your second question (which is not really related at all, I am sorry to say), you are just describing a feature of how regular expressions work. The regular expression cron matches the string cron. The regular expression [c]ron does not match the string [c]ron. Thus the first grep command will find itself in a process list, but the second one will not.

Zac Thompson
  • 12,401
  • 45
  • 57
1

Your actual question has been answered by others, but I'll offer a tip: If you would like to avoid seeing the grep process listed, you can do it this way:

$ ps -ef | grep [c]ron
Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
  • Thanks but I had further doubts, on the answers below. Please see my comments. – Ankur Agarwal Aug 01 '11 at 02:30
  • Why does using grep [c]ron not list the grep process, whereas using grep cron always lists the grep process? What is the effect of the bracket expression. Can you please elaborate? – Ankur Agarwal Mar 13 '12 at 01:26
  • 3
    @abc It works because the `grep` regular expression matches exactly `c` followed by `ron`, but the `ps` output will show literally `grep [c]ron` since that was the command entered. Therefore, `grep`'s expression doesn't match it and filters it out. – Michael Berkowski Mar 13 '12 at 01:36
  • This needs more quoting; make it `grep '[c]ron'` or else it'll become `grep cron` if you run this in a directory that contains a file named `cron` (as the shell replaces anything that looks like a glob with a list of files it expands to... and that's if you're lucky and it's on default settings; with `nullglob` it'll just become `grep` with no arguments, with `failglob` it'll become an error). – Charles Duffy Mar 30 '18 at 19:43
0

pgrep is sometimes better than ps -ef | grep word because it exclude the grep. Try

pgrep -f bash
pgrep -lf bash
Felipe Alvarez
  • 3,720
  • 2
  • 33
  • 42
-3
$ ps -ef | grep cron

Linux Shell always execute command from right to left. so, before ps -ef execution grep cron already executed that's why o/p show's the command itself.

$ ps -ef | grep [c]ron

But in this u specified grep ron followed by only c. so, o/p is without command line because in command there is [c]ron.

ollo
  • 24,797
  • 14
  • 106
  • 155