33

I would like to capture output from a UNIX process but limit max file size and/or rotate to a new file.

I have seen logrotate, but it does not work real-time. As I understand, it is a "clean-up" job that runs in parallel.

What is the right solution? I guess I will write a tiny script to do it, but I was hoping there was a simple way with existing text tools.

Imagine:

my_program | tee --max-bytes 100000 log/my_program_log

Would give... Always writing latest log file as: log/my_program_log

Then, as it fills... renamed to log/my_program_log000001 and start a new log/my_program_log.

kevinarpe
  • 20,319
  • 26
  • 127
  • 154

7 Answers7

33

use split:

my_program | tee >(split -d -b 100000 -)

Or if you don't want to see the output, you can directly pipe to split:

my_program | split -d -b 100000 -

As for the log rotation, there's no tool in coreutils that does it automatically. You could create a symlink and periodically update it using a bash command:

while ((1)); do ln -fns target_log_name $(ls -t | head -1); sleep 1; done
Foo Bah
  • 25,660
  • 5
  • 55
  • 79
  • 1
    Bah... I forgot about the >() operator in Bash (and some other shells). I use it too infrequently. Yours is the most concise answer. – kevinarpe Oct 05 '14 at 12:00
  • For the first solution using `tee`, is there a reason why I shouldn't use `my_program | tee | split -d -b 100000 -`?` – asafc Dec 26 '18 at 20:46
  • for a decent file naming use for example `tee >(split --additional-suffix=.log -d -b 1000000 - debug.0)` – rubo77 Jun 12 '19 at 22:57
  • Brilliant way of using tee and split together. – deltaray Feb 05 '20 at 15:49
  • 1
    @asafc: `tee` reads `stdin` and writes two copies, one to a file and the other to `tee`'s `stdout`. It is the file output that needs to be split/rotated. Your suggestion will consume the `stdout` copy that's meant to display on a terminal, thus defeating the whole reason for using `tee` – Ben Voigt Nov 18 '22 at 15:40
  • Also, if you're working on line-based text content and don't want to break up lines in output-files, just use `-l ` instead of `-b ` – weiresr Mar 17 '23 at 09:01
8

In package apache2-utils is present utility called rotatelogs, it fully meet to your requirements.

Synopsis:

rotatelogs [ -l ] [ -L linkname ] [ -p program ] [ -f ] [ -t ] [ -v ] [ -e ] [ -c ] [ -n number-of-files ] logfile rotationtime|filesize(B|K|M|G) [ offset ]

Example:

your_program | rotatelogs -n 5 /var/log/logfile 1M

Full manual you may read on this link.

PRIHLOP
  • 1,441
  • 1
  • 16
  • 16
  • With -n 5, you would start writing to first file each time program starts. Instead if you set -n 1 it would write to same file each time giving us continuous logs – Utkarsh Naiknaware Aug 11 '19 at 10:31
5

To limit the size to 100 bytes, you can simply use dd:

my_program | dd bs=1 count=100 > log

When 100 bytes are written, dd will close the pipe and my_program receives EPIPE.

Jakob
  • 193
  • 1
  • 5
5

or using awk

program | awk 'BEGIN{max=100} {n+=length($0); print $0 > "log."int(n/max)}'

It keeps lines together, so the max is not exact, but this could be nice especially for logging purposes. You can use awk's sprintf to format the file name.

Here's a pipable script, using awk

#!/bin/bash
maxb=$((1024*1024))    # default 1MiB
out="log"              # output file name
width=3                # width: log.001, log.002
while getopts "b:o:w:" opt; do
  case $opt in
    b ) maxb=$OPTARG;;
    o ) out="$OPTARG";;
    w ) width=$OPTARG;;
    * ) echo "Unimplented option."; exit 1
  esac
done
shift $(($OPTIND-1))

IFS='\n'              # keep leading whitespaces
if [ $# -ge 1 ]; then # read from file
  cat $1
else                  # read from pipe
  while read arg; do
    echo $arg
  done
fi | awk -v b=$maxb -v o="$out" -v w=$width '{
    n+=length($0); print $0 > sprintf("%s.%0.*d",o,w,n/b)}'

save this to a file called 'bee', run 'chmod +x bee' and you can use it as

program | bee

or to split an existing file as

bee -b1000 -o proglog -w8 file
jaap
  • 51
  • 1
  • 2
  • 1
    I agree with your comment: "It keeps lines together, so the max is not exact, but this could be nice especially for logging purposes." – kevinarpe Oct 05 '14 at 11:59
2

The most straightforward way to solve this is probably to use python and the logging module which was designed for this purpose. Create a script that read from stdin and write to stdout and implement the log-rotation described below.

The "logging" module provides the

class logging.handlers.RotatingFileHandler(filename, mode='a', maxBytes=0,
              backupCount=0, encoding=None, delay=0)

which does exactly what you are asking about.

You can use the maxBytes and backupCount values to allow the file to rollover at a predetermined size.

From docs.python.org

Sometimes you want to let a log file grow to a certain size, then open a new file and log to that. You may want to keep a certain number of these files, and when that many files have been created, rotate the files so that the number of files and the size of the files both remain bounded. For this usage pattern, the logging package provides a RotatingFileHandler:

import glob
import logging
import logging.handlers

LOG_FILENAME = 'logging_rotatingfile_example.out'

# Set up a specific logger with our desired output level
my_logger = logging.getLogger('MyLogger')
my_logger.setLevel(logging.DEBUG)

# Add the log message handler to the logger
handler = logging.handlers.RotatingFileHandler(
              LOG_FILENAME, maxBytes=20, backupCount=5)

my_logger.addHandler(handler)

# Log some messages
for i in range(20):
    my_logger.debug('i = %d' % i)

# See what files are created
logfiles = glob.glob('%s*' % LOG_FILENAME)

for filename in logfiles:
    print(filename)

The result should be 6 separate files, each with part of the log history for the application:

logging_rotatingfile_example.out
logging_rotatingfile_example.out.1
logging_rotatingfile_example.out.2
logging_rotatingfile_example.out.3
logging_rotatingfile_example.out.4
logging_rotatingfile_example.out.5

The most current file is always logging_rotatingfile_example.out, and each time it reaches the size limit it is renamed with the suffix .1. Each of the existing backup files is renamed to increment the suffix (.1 becomes .2, etc.) and the .6 file is erased.

Obviously this example sets the log length much much too small as an extreme example. You would want to set maxBytes to an appropriate value.

Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
  • 2
    I am confused. My program is not Python. How does this help me? I want to use standard GNU coreutils: awk/tee/split/etc. – kevinarpe Jul 18 '11 at 11:10
1

Another solution will be to use Apache rotatelogs utility.

Or following script:

#!/bin/ksh
#rotatelogs.sh -n numberOfFiles pathToLog fileSize[B|K|M|G]
numberOfFiles=10
while getopts "n:fltvecp:L:" opt; do
    case $opt in
  n) numberOfFiles="$OPTARG"
    if ! printf '%s\n' "$numberOfFiles" | grep '^[0-9][0-9]*$' >/dev/null;     then
      printf 'Numeric numberOfFiles required %s. rotatelogs.sh -n numberOfFiles pathToLog fileSize[B|K|M|G]\n' "$numberOfFiles" 1>&2
      exit 1
    elif [ $numberOfFiles -lt 3 ]; then
      printf 'numberOfFiles < 3 %s. rotatelogs.sh -n numberOfFiles pathToLog fileSize[B|K|M|G]\n' "$numberOfFiles" 1>&2
    fi
  ;;
  *) printf '-%s ignored. rotatelogs.sh -n numberOfFiles pathToLog fileSize[B|K|M|G]\n' "$opt" 1>&2
  ;;
  esac
done
shift $(( $OPTIND - 1 ))
pathToLog="$1"
fileSize="$2"
if ! printf '%s\n' "$fileSize" | grep '^[0-9][0-9]*[BKMG]$' >/dev/null; then
  printf 'Numeric fileSize followed by B|K|M|G required %s. rotatelogs.sh -n numberOfFiles pathToLog fileSize[B|K|M|G]\n' "$fileSize" 1>&2
  exit 1
fi
sizeQualifier=`printf "%s\n" "$fileSize" | sed "s%^[0-9][0-9]*\([BKMG]\)$%\1%"`
multip=1
case $sizeQualifier in
B) multip=1 ;;
K) multip=1024 ;;
M) multip=1048576 ;;
G) multip=1073741824 ;;
esac
fileSize=`printf "%s\n" "$fileSize" | sed "s%^\([0-9][0-9]*\)[BKMG]$%\1%"`
fileSize=$(( $fileSize * $multip ))
fileSize=$(( $fileSize / 1024 ))
if [ $fileSize -le 10 ]; then
  printf 'fileSize %sKB < 10KB. rotatelogs.sh -n numberOfFiles pathToLog fileSize[B|K|M|G]\n' "$fileSize" 1>&2
  exit 1
fi
if ! touch "$pathToLog"; then
  printf 'Could not write to log file %s. rotatelogs.sh -n numberOfFiles pathToLog fileSize[B|K|M|G]\n' "$pathToLog" 1>&2
  exit 1
fi
lineCnt=0
while read line
do
  printf "%s\n" "$line" >>"$pathToLog"
  lineCnt=$(( $lineCnt + 1 ))
  if [ $lineCnt -gt 200 ]; then
    lineCnt=0
    curFileSize=`du -k "$pathToLog" | sed -e 's/^[  ][  ]*//' -e 's%[   ][  ]*$%%' -e 's/[  ][  ]*/[    ]/g' | cut -f1 -d" "`
    if [ $curFileSize -gt $fileSize ]; then
      DATE=`date +%Y%m%d_%H%M%S`
      cat "$pathToLog" | gzip -c >"${pathToLog}.${DATE}".gz && cat /dev/null >"$pathToLog"
      curNumberOfFiles=`ls "$pathToLog".[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9].gz | wc -l | sed -e 's/^[   ][  ]*//' -e 's%[   ][  ]*$%%' -e 's/[  ][  ]*/[    ]/g'`
      while [ $curNumberOfFiles -ge $numberOfFiles ]; do
        fileToRemove=`ls "$pathToLog".[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9].gz | head -1`
        if [ -f "$fileToRemove" ]; then
          rm -f "$fileToRemove"
          curNumberOfFiles=`ls "$pathToLog".[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9].gz | wc -l | sed -e 's/^[   ][  ]*//' -e 's%[   ][  ]*$%%' -e 's/[  ][  ]*/[    ]/g'`
        else
          break
        fi
      done
    fi
  fi
done
bravomail
  • 21
  • 2
  • The script was not love at first sight but having to install apache just to get the rotatelog made me look at it again and I have to say it's pretty neat! – Red Pill Feb 03 '21 at 16:46
  • Performance: When I let a testapp spam to stdout and count the records, I see rotatelog with no difference to `testapp > mylogfile`. while with the script we manage to swallow and write about 40% of the records in a time delta. – Red Pill Feb 03 '21 at 17:07
1

Limiting the max size can also be done with head:

my_program | head -c 100  # Limit to 100 first bytes

See this for benefits over dd: https://unix.stackexchange.com/a/121888/

orestisf
  • 1,396
  • 1
  • 15
  • 30