2

I want to write the textual output of tcpdump to a compressed file.

First I tried the most obvious:

# tcpdump -l -i eth0 | gzip -c > test.gz
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C63 packets captured
244 packets received by filter
0 packets dropped by kernel
4 packets dropped by interface

# file test.gz
test.gz: empty
# 

Then I found the following solution for Debian 9 (Stretch):

# tcpdump -l -i eth0 | ( gzip -c > test.gz & )
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C150 packets captured
160 packets received by filter
0 packets dropped by kernel

# file test.gz 
test.gz: gzip compressed data, last modified: Wed May 23 12:56:16 2018, from Unix
# 

This works fine on Debian 9 (Stretch) but not on Debian 8 (Jessie):

# tcpdump -l -i eth0 | ( gzip -c > test.gz & )
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
tcpdump: Unable to write output: Broken pipe
# 

Two questions:

  1. What's wrong with the 'obvious solution'?
  2. How to capture and zip the tcpdump output in Debian Jessie? (The obvious solution doesn't work there either)

Thanks!

Charly
  • 1,270
  • 19
  • 42
  • The obvious solution would have worked with enough content flowing through the pipeline. If you're only trying to capture a few lines, or need to capture everything up to the very end when ctrl+c is pressed, then... yeah, you've got the issue here. (However, capturing only very small amounts of content is not *usually* the kind of case when in-line gzip is needed). – Charles Duffy May 23 '18 at 13:28
  • BTW, for extra paranoia, you could change the `tcpdump`s in my answer to `stdbuf -oL tcpdump`, to tell glibc to flush tcpdump's output after every line. Shouldn't be needed, though, since SIGINT is capturable (and so allows the program an opportunity to do a final flush itself). You'd need it if shutdown were happening with SIGKILL, however. – Charles Duffy May 23 '18 at 13:32
  • ...btw, [BashFAQ #9](https://mywiki.wooledge.org/BashFAQ/009) may be useful supplemental reading. – Charles Duffy May 23 '18 at 13:40
  • Thanks for your explanation. I ve just verified it. `gzip` writes only to file, when a full 32k block is compressed. Since my original `tcpdump` output contained many similar lines, it would have taken quite long until the 32k is reached. – Charly May 24 '18 at 05:36

2 Answers2

3

What Was Happening

To explain what happens here:

  • Ctrl+C sends a SIGINT to the entire process group. That means it doesn't just terminate tcpdump, but also terminates gzip. (The workarounds you were attempting try to avoid this by moving content into background processes, and thus out of the same process group).
  • stdout is line-buffered by default only when output is to a TTY; when output is to a FIFO, it's block-buffered, allowing greater efficiency by writing data from the left-hand process only when a sufficiently larger chunk is available. In many situations, you could thus just use stdbuf -oL or similar to disable this. However...
  • gzip by its nature cannot operate completely unbuffered. This is because block-based compression algorithms need to collect data into, well, blocks; analyze that content in bulk; &c.

So, if gzip and tcpdump are terminated at the same time, that means there's no assurance that tcpdump will actually be able to flush its output buffer, and then have gzip read, compress and write that flushed data, before gzip itself exits from the signal it received at the same time.


Fixing The Problem

Note that the code snippets under headers containing the word "Interactive" are intended for interactive use.


A Reliable Interactive Workaround (For Bash)

As a surefire solution, move the gzip completely out-of-band, so it isn't prone to being sent a SIGINT when you press ctrl+c on the tcpdump command:

exec 3> >(gzip -c >test.gz)  # Make FD 3 point to gzip
tcpdump -l -i eth0 >&3       # run tcpdump **AS A SEPARATE COMMAND** writing to that fd
exec 3>&-                    # later, after you cancelled tcpdump, close the FD.

A Reliable Interactive Workaround (For Any POSIX Shell)

Same thing, but slightly longer and not relying on process substitution:

mkfifo test.fifo                            # create a named FIFO
gzip -c <test.fifo >test.gz & gzip_pid="$!" # start gzip, reading from that named FIFO
tcpdump -l -i eth0 >test.fifo               # start tcpdump, writing to that named FIFO
rm test.fifo                                # delete the FIFO when done
wait "$gzip_pid"                            # ...and wait for gzip to exit

Note that the wait will have the exit status of the gzip process, so you can determine whether it encountered an error.


A Reliable Scripted Workaround (For Any POSIX Shell)

If we're running a script, then it's appropriate to set up a signal handler so we can handle SIGINT (by killing only tcpdump) explicitly:

#!/bin/sh
[ "$#" -gt 0 ] || {
  echo "Usage: ${0##*/} file.tcpdump.gz [tcpdump-args]" >&2
  echo "  Example: ${0##*/} foo.tcpdump.gz -l -i eth0" >&2
  exit 1
}
outfile=$1; shift
fifo=test-$$.fifo # for real code, put this in a unique temporary directory

trap '[ -n "$tcpdump_pid" ] && kill "$tcpdump_pid"' INT
trap 'rm -f -- "$fifo"' EXIT

rm -f -- "$fifo"; mkfifo "$fifo" || exit
gzip -c >"$outfile" <"$fifo" & gzip_pid=$!

# avoid trying to run tcpdump if gzip obviously failed to start
{ [ -n "$gzip_pid" ] && [ "$gzip_pid" -gt 0 ] && kill -0 "$gzip_pid"; } || exit 1

tcpdump "$@" >"$fifo" & tcpdump_pid=$!

# return exit status of tcpdump if it fails, or gzip if tcpdump succeeds
wait "$tcpdump_pid" || wait "$gzip_pid"
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
1

From Charles Duffy's answer (a big thanks to him!):

Ctrl+C sends a SIGINT to the entire process group. That means it doesn't just terminate tcpdump, but also terminates gzip. (The workarounds you were attempting try to avoid this by moving content into background processes, and thus out of the same process group).

Since he's right that gzip writes the output file only when a full 32k block is compressed, I've started the 'obvious solution' in one terminal...

$ tcpdump -l -i eth0 | gzip -c > test.gz
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
1926 packets captured
1938 packets received by filter
0 packets dropped by kernel
$ 

and killed the tcpdump from a second terminal:

$ killall -INT tcpdump
$

Starting the 'obvious solution' in background tcpdump -l -i eth0 | gzip -c > test.gz & would allow to kill the tcpdump from the same terminal.

Charly
  • 1,270
  • 19
  • 42