I have some large logfiles that have the old syslog format dates from RFC3162 (MMM dd HH:mm:ss) that I want to change over to the new syslog format dates from RFC5424 (YYYY-mm-ddTHH:mm:ss +TMZ). I have created the following bash script:
#!/bin/bash
#Loop over directories
for i in $1
do
echo "Processing directory $i"
if [ -d $i ]
then
cd $i
#Loop over log files inside the directory
for j in *.2021
do
echo "Processing file $j"
#Read line by line and perform transformation on dates and append to new file
cat $j | \
while read CMD; do
tmpdate=$(printf '%s\n' "$CMD" | awk -F" $i" 'BEGIN {ORS=""}; {print $1}')
newdate=$(date +'%Y-%m-%dT%H:%M:%S+02:00' -d "$tmpdate")
printf '%s\n' "$CMD" | sed 's/'"$tmpdate"'/'"$newdate"'/g' >> $j.new
done
mv $j.new $j
done
cd ..
fi
done
But this is taking a looooong time to execute since I have files with several million lines (logs dating back over one year on a mail server for example). So far this has been running for days and still a lot of lines to parse :-)
So two questions.
- Why is this script taking such a long time to execute?
- Is there a faster way to do this? Using one of GNU utils (sed, awk etc), bash or python.
======== EDIT =======
Here are examples of the old format:
Feb 1 21:59:44 calendar os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sda2
Feb 1 21:59:44 calendar 50mounted-tests: debug: /dev/sda2 type not recognised; skipping
Feb 1 21:59:44 calendar os-prober: debug: os detected by /usr/lib/os-probes/50mounted-tests
Note that there are 2 spaces between Feb and 1, if the date is 10 or higher the space is only 1 as in
Feb 10 10:39:53 calendar os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sda2
In the new format it would look like this:
2021-02-01T21:59:44+02:00 calendar os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sda2
2021-02-01T21:59:44+02:00 calendar 50mounted-tests: debug: /dev/sda2 type not recognised; skipping
2021-02-01T21:59:44+02:00 calendar os-prober: debug: os detected by /usr/lib/os-probes/50mounted-tests
TIA.