5

I am using tcpflow to log network traffic on a server. I want to log this data to a file, but not all of it. The monitoring process would be running in the background as a daemon indefinitely.

Some lines of the stream include a byte count, and if I see that byte count (say, 800 bytes), then I need to log the next 800 bytes to the file. If not, I wish to not write to the file.

What's the best way for me to do this kind of "on-the-fly pre-processing" of the stream to decide what to redirect to the log file? Some kind of second daemon script that is listening to the stream, which gets piped in to that script?

Example:

I see the following line in the stream:

1343932842: 010.079.091.189.35856-010.104.001.199.11211: set i:1:20163484235 0 0 1429

First, I need to check that it has a "set". Then, I examine the last piece of the line (1429), then read the next 1429 bytes and write those to a file.

Tim
  • 6,079
  • 8
  • 35
  • 41
  • Looks like tcpflow supports filter expressions like tcpdump – jordanm Aug 02 '12 at 18:37
  • @jordanm I don't think that will help me. Here's an example line: `1343932842: 010.079.091.189.35856-010.104.001.199.11211: set i:1:20163484235 0 0 1429` I need to first see that line has a "set", then examine the last piece of the line (1429), then read the next 1429 bytes and write those to a file. – Tim Aug 02 '12 at 18:40
  • what language are you writing this in? what environment is this executing in? Tags "bash" and "php" are confusing... – tucuxi Aug 02 '12 at 19:15
  • binary TCP streams can contain any data, including new-lines; are you sure that there will only be text? Otherwise, you will need to be sure you are separating "control lines" (like your example) fro actual data... – tucuxi Aug 02 '12 at 19:32
  • @tucuxi the process that is generating the stream is a binary compiled from C, run from the command line. It is guaranteed to return text data. – Tim Aug 02 '12 at 20:03
  • are you sure you read the stream itself, not a "humanized" version ? If humanized, then it's probably not exactly 1429 bytes that needs to be read next (intead you can probably read until "something", which "something" probably happens right after a newline) – Olivier Dulac Jan 06 '13 at 15:02

3 Answers3

1

Yes, use a daemon program that takes the stream as input, and does just what you described. I would recommend C instead of a script, as it has very straightforward input/output, and very low overhead.

Assuming you have an executable called 'capture' and a filtering program called 'filter', you can chain them together from a bash shell using

bash-prompt$ capture capture-params | filter

Anything that capture writes to stdout will be available to filter as input from stdin. It is a simple matter, from filter's point of view, of reading lines, and when the end ... size pattern is found, writing the output to an output file (or again to stdout). If you write to a stdout, you can redirect that to a file using

bash-prompt$ capture capture-params | filter > output-file.txt
tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • How do I hook the two up? Do I have a second C program make a call to the shell to start the first program that generates the stream? Do I have the first program write to a temporary file, which the second program reads? (I would prefer to not use that solution). – Tim Aug 02 '12 at 20:06
  • Added a few examples of chaining using bash. This is very much "the unix way" of doing things: small programs that do single things well get composed into larger programs. – tucuxi Aug 02 '12 at 20:47
  • Thanks for that. So then, would the "filter" program be a C program that is basically an infinite loop, repeatedly reading stdin, parsing/processing it, and writing to stdout? Does the filter program have to be compiled in C? Can it just be a PHP script? – Tim Aug 02 '12 at 21:14
  • Yes to the first part (how `filter` would work); as to the second part, yes, you can use http://stackoverflow.com/questions/554760/php-standard-input to do standard input/output in PHP. – tucuxi Aug 02 '12 at 21:17
  • This is the correct way to do it. In fact, the generic term for a program whose purpose is to read from stdin, transform the input and write to stdout is a "filter". https://en.wikipedia.org/wiki/Filter_(Unix) There are lots of existing general purpose filters. You may be able to use them instead of writing your own. The Wikipedia page I referenced has a bunch you can investigate. Note that most of the existing ones work on plain text, not binary. For binary you have fewer options and may have to write your own. – Brenda J. Butler Jul 04 '13 at 14:09
0

You can get on the fly text processing with awk. You will need to learn the language but I use for similar tasks at live log parsing. I do tail -f file.log | awk -f myscript.awk

Each line will be analyzed through the awk script you create and with if-then-else, you can detect some words present in the line and activate other parts of awk code to parse the line differently or even run external programs.

Kurt Kraut
  • 21
  • 2
0

By far, the most elegant application for what you are describing is using a low footprint round robin database. RRDtool is the OpenSource industry standard, high performance data logging and graphing.

Using a bash command you can input your data into the database, and should you choose to, graphing it is also very simple.

SEE: http://oss.oetiker.ch/rrdtool/gallery/index.en.html

MattyV
  • 41
  • 3