15

Consider the following simplified example:


my_prog|awk '...' > output.csv &
my_pid="$!" #Gives the PID for awk instead of for my_prog
sleep 10
kill $my_pid #my_prog still has data in its buffer that awk never saw. Data is lost!

In bash, $my_pid points to the PID for awk. However, I need the PID for my_prog. If I kill awk, my_prog does not know to flush it's output buffer and data is lost. So, how would one obtain the PID for my_prog? Note that ps aux|grep my_prog will not work since there may be several my_prog's going.

NOTE: changed cat to awk '...' to help clarify what I need.

User1
  • 39,458
  • 69
  • 187
  • 265
  • 1
    I don't really pipe to cat, this is just a simplified example. It's really an ugly awk script, but they both behave the same way. – User1 Jul 27 '10 at 15:59
  • what are you trying to accomplish? I'm sure there must be a better way. – msw Jul 27 '10 at 16:06
  • I have a program, my_prog, that generates a ton of data. I use an awk script to summarize the data into a CSV file that will be the basis of a report. The program actually outputs data just fine until I start piping it. I believe it has something to do with C's 'setbuf' feature where it treats terminals as line buffered and files as block buffered (I might be wrong on this point). But maybe if I could fool the program into thinking it's writing to a terminal when it talks to awk, that might work. It'd be even easier, if I could just get the PID since my_prog flushes its buffer on exit. – User1 Jul 27 '10 at 16:15
  • I updated the question to clarify. Thank you for asking. Maybe there is an easier way. – User1 Jul 27 '10 at 17:03
  • Exactly the problem i am facing now! This place is wonderful! – Martian Puss Mar 14 '14 at 03:32
  • 2
    possible duplicate of [How to get the PID of a process that is piped to another process in Bash?](http://stackoverflow.com/questions/1652680/how-to-get-the-pid-of-a-process-that-is-piped-to-another-process-in-bash) – Jan Matějka May 12 '14 at 19:04
  • Yes we need to do this too - Kill my_prog ... The **kill** man page says we can kill using the process group id (gpid). So one method would be how to find that GPID for the whole command line. Unfortunately it isn't the same as the PID emitted by "$!". – will May 11 '15 at 05:41

9 Answers9

12

Just had the same issue. My solution:

process_1 | process_2 &
PID_OF_PROCESS_2=$!
PID_OF_PROCESS_1=`jobs -p`

Just make sure process_1 is the first background process. Otherwise, you need to parse the full output of jobs -l.

Marvin
  • 121
  • 1
  • 2
  • Can parse this way if make jobs -l. Later make: PID_OF_PROCESS_1=`jobs -l | grep process_1 | cut -f2 -d" "` – rfranr Oct 07 '12 at 11:56
6

I was able to solve it with explicitly naming the pipe using mkfifo.

Step 1: mkfifo capture.

Step 2: Run this script


my_prog > capture &
my_pid="$!" #Now, I have the PID for my_prog!
awk '...' capture > out.csv & 
sleep 10
kill $my_pid #kill my_prog
wait #wait for awk to finish.

I don't like the management of having a mkfifo. Hopefully someone has an easier solution.

User1
  • 39,458
  • 69
  • 187
  • 265
  • why are you killing a process whose output you want? – msw Jul 27 '10 at 17:08
  • The process is a hardware monitoring program that will run until it is killed. When the process receives the kill signal, it flushes its buffer. In reality, the bash script will kill my_prog when the test is over which is represented by the sleep statement above. – User1 Jul 27 '10 at 17:13
5

Here is a solution without wrappers or temporary files. This only works for a background pipeline whose output is captured away from stdout of the containing script, as in your case. Suppose you want to do:

cmd1 | cmd2 | cmd3 >pipe_out &
# do something with PID of cmd2

If only bash could provide ${PIPEPID[n]}!! The replacement "hack" that I found is the following:

PID=$( { cmd1 | { cmd2 0<&4 & echo $! >&3 ; } 4<&0 | cmd3 >pipe_out & } 3>&1 | head -1 )

If needed, you can also close the fd 3 (for cmd*) and fd 4 (for cmd2) with 3>&- and 4<&-, respectively. If you do that, for cmd2 make sure you close fd 4 only after you redirect fd 0 from it.

Matei David
  • 2,322
  • 3
  • 23
  • 36
4

Add a shell wrapper around your command and capture the pid. For my example I use iostat.

#!/bin/sh
echo $$ > /tmp/my.pid
exec iostat 1

Exec replaces the shell with the new process preserving the pid.

test.sh | grep avg

While that runs:

$ cat my.pid 
22754
$ ps -ef | grep iostat
userid  22754  4058  0 12:33 pts/12   00:00:00 iostat 1

So you can:

sleep 10
kill `cat my.pid`

Is that more elegant?

Demosthenex
  • 4,343
  • 2
  • 26
  • 22
3

Improving @Marvin's and @Nils Goroll's answers with a oneliner that extract the pids for all commands in the pipe into a shell array variable:

# run some command
ls -l | rev | sort > /dev/null &

# collect pids
pids=(`jobs -l % | egrep -o '^(\[[0-9]+\]\+|    ) [ 0-9]{5} ' | sed -e 's/^[^ ]* \+//' -e 's! $!!'`)

# use them for something
echo pid of ls -l: ${pids[0]}
echo pid of rev: ${pids[1]}
echo pid of sort: ${pids[2]}
echo pid of first command e.g. ls -l: $pids
echo pid of last command e.g. sort: ${pids[-1]}

# wait for last command in pipe to finish
wait ${pids[-1]}

In my solution ${pids[-1]} contains the value normally available in $!. Please note the use of jobs -l % which outputs just the "current" job, which by default is the last one started.

Sample output:

pid of ls -l: 2725
pid of rev: 2726
pid of sort: 2727
pid of first command e.g. ls -l: 2725
pid of last command e.g. sort: 2727

UPDATE 2017-11-13: Improved the pids=... command that works better with complex (multi-line) commands.

Jonas Berlin
  • 3,344
  • 1
  • 27
  • 33
2

With inspiration from @Demosthenex's answer: using subshells:

$ ( echo $BASHPID > pid1; exec vmstat 1 5 ) | tail -1 & 
[1] 17371
$ cat pid1
17370
$ pgrep -fl vmstat
17370 vmstat 1 5
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
2

Based on your comment, I still can't see why you'd prefer killing my_prog to having it complete in an orderly fashion. Ten seconds is a pretty arbitrary measurement on a multiprocessing system whereby my_prog could generate 10k lines or 0 lines of output depending upon system load.

If you want to limit the output of my_prog to something more determinate try

my_prog | head -1000 | awk

without detaching from the shell. In the worst case, head will close its input and my_prog will get a SIGPIPE. In the best case, change my_prog so it gives you the amount of output you want.

added in response to comment:

In so far as you have control over my_prog give it an optional -s duration argument. Then somewhere in your main loop you can put the predicate:

if (duration_exceeded()) {
    exit(0);
}

where exit will in turn properly flush the output FILEs. If desperate and there is no place to put the predicate, this could be implemented using alarm(3), which I am intentionally not showing because it is bad.

The core of your trouble is that my_prog runs forever. Everything else here is a hack to get around that limitation.

msw
  • 42,753
  • 9
  • 87
  • 112
  • 1
    See my comment in my answer. I guess I could have given more detail on the original question. The solution above might work for some, but this case is a bit different. Thank you for all your help so far. I hope you can tell me of an easier solution than my answer. – User1 Jul 27 '10 at 17:18
1

My solution was to query jobs and parse it using perl.
Start two pipelines in the background:

$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &
$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &

Query background jobs:

$ jobs
[1]-  Running                 sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &
[2]+  Running                 sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &

$ jobs -l
[1]-  6108 Running                 sleep 600
      6109                       | sleep 600
      6110                       | sleep 600
      6111                       | sleep 600
      6112                       | sleep 600 &
[2]+  6114 Running                 sleep 600
      6115                       | sleep 600
      6116                       | sleep 600
      6117                       | sleep 600
      6118                       | sleep 600 &

Parse the jobs list of the second job %2. The parsing is probably error prone, but in these cases it works. We aim to capture the first number followed by a space. It is stored into the variable pids as an array using the parenthesis:

$ pids=($(jobs -l %2 | perl -pe '/(\d+) /; $_=$1 . "\n"'))
$ echo $pids
6114
$ echo ${pids[*]}
6114 6115 6116 6117 6118
$ echo ${pids[2]}
6116
$ echo ${pids[4]}
6118

And for the first pipeline:

$ pids=($(jobs -l %1 | perl -pe '/(\d+) /; $_=$1 . "\n"'))
$ echo ${pids[2]}
6110
$ echo ${pids[4]}
6112

We could wrap this into a little alias/function:

function pipeid() { jobs -l ${1:-%%} | perl -pe '/(\d+) /; $_=$1 . "\n"'; }
$ pids=($(pipeid))     # PIDs of last job
$ pids=($(pipeid %1))  # PIDs of first job

I have tested this in bash and zsh. Unfortunately, in bash I could not pipe the output of pipeid into another command. Probably because that pipeline is ran in a sub shell not able to query the job list??

hzpc-joostk
  • 361
  • 3
  • 7
0

I was desperately looking for good solution to get all the PIDs from a pipe job, and one promising approach failed miserably (see previous revisions of this answer).

So, unfortunately, the best I could come up with is parsing the jobs -l output using GNU awk:

function last_job_pids {
    if [[ -z "${1}" ]] ; then
        return
    fi

    jobs -l | awk '
        /^\[/ { delete pids; pids[$2]=$2; seen=1; next; }
        // { if (seen) { pids[$1]=$1; } }
        END { for (p in pids) print p; }'
}
Nils Goroll
  • 148
  • 4