I've been trying to understand a problem that's cropped up with some of the scripts we use at work.
To generate many of our script logs, we utilize the exec
command and file redirects to print all output from the script to both the terminal and a log file. Occasionally, for information that doesn't need to be displayed to the user, we do a straight redirect to the log file.
The issue we're seeing occurs on the last line of output to the file when we're printing the number of errors that occurred during that execution: The text doesn't get printed to the file.
In an attempt to diagnose the problem, I wrote a simplified version of our production script (script1.bash) and a test script (script2.bash) to try to tease out the problem.
script1.bash
#!/bin/bash
log_name="${USER}_`date +"%Y%m%d-%H%M%S"`_${HOST}_${1}.log"
log="/tmp/${log_name}"
log_tmp="/tmp/temp_logs"
err_count=0
finish()
{
local ecode=0
if [ $# -eq 1 ]; then
ecode=${1}
fi
# This is the problem line
echo "Error Count: ${err_count}" >> "${log}"
mvlog
local success=$?
exec 1>&3 2>&4
if [ ${success} -ne 0 ]; then
echo ""
echo "WARNING: Failed to save log file to ${log_tmp}"
echo ""
ecode=$((ecode+1))
fi
exit ${ecode}
}
mvlog()
{
local ecode=1
if [ ! -d "${log_tmp}" ]; then
mkdir -p "${log_tmp}"
chmod 775 "${log_tmp}"
fi
if [ -d "${log_tmp}" ]; then
rsync -pt --bwlimit=4096 "${log}" "${log_tmp}/${log_name}" 2> /dev/null
[ $? -eq 0 ] && ecode=0
if [ ${ecode} -eq 0 ]; then
rm -f "${log}"
fi
fi
}
exec 3>&1 4>&2 >(tee "${log}") 2>&1
ecode=0
echo
echo "Some text"
echo
finish ${ecode}
script2.bash
#!/bin/bash
runs=10000
logdir="/tmp/temp_logs"
if [ -d "${logdir}" ]; then
rm -rf "${logdir}"
fi
for i in $(seq 1 ${runs}); do
echo "Conducting run #${i}/${runs}..."
${HOME}/bin/script1.bash ${i}
done
echo "Scanning logs from runs..."
total_count=`find "${logdir}" -type f -name "*.log*" | wc -l`
missing_count=`grep -L 'Error Count:' ${logdir}/*.log* | grep -c /`
echo "Number of runs performed: ${runs}"
echo "Number of log files generated: ${total_count}"
echo "Number of log files missing text: ${missing_count}"
My first test indicated roughly 1% of the time the line isn't written to the log file. I then proceeded to try several different methods of handling this line of output.
- Echo and Wait
echo "Error Count: ${err_count}" >> "${log}"
wait
- Alternate print method
printf "Error Count: %d\n" ${err_count} >> "${log}"
- No Explicit File Redirection
echo "Error Count: ${err_count}"
- Echo and Sleep
echo "Error Count: ${err_count}" >> "${log}"
sleep 0.2
Of these, #1 and #2 each had a 1% fail rate while #4 had a staggering 99% fail rate. #3 was the only methodology that had a 0% fail rate.
At this point, I'm at a loss for why this behavior is occurring, so I'm asking the gurus here for any insight.
(Note that the simple solution is to implement #3, but I want to know why this is happening.)