0

I had created a shell script, to find files that use highest disk space and files that are recently modified. Our disks are 2-3 TB in size and the script takes hours to complete. Can the script be optimized to reduce the execution time? Find command invokes the time in here. Is there any other efficient way to find the recently modified files?

#!/bin/bash

outfile="/tmp/output.txt"

function printline() {
echo "-------------------------------------------------------------" | tee -a $outfile
echo "" | tee -a $outfile
}

function nullcheck() {
echo ""
echo "Usage: search.sh [directory path]"
echo ""
}

if [ $# -ne 0 ];then
        if [ -d $1 ];then
                echo ""
                printline;
                echo $(hostname) $(date) "user:"$(whoami) | tee -a $outfile
                echo "" | tee -a $outfile
                echo "---------------------Current Disk Usage----------------------" | tee -a $outfile
                df -hP $1 | tee -a $outfile
                printline;
                LARGE=$(find $(readlink -f $1) -type f -exec du -Sh {} \+ | sort -rh | head -50)
                echo "--------------------Largest top 50 files---------------------" | tee -a $outfile
                echo "$LARGE" | tee -a $outfile
                printline;
                echo "------Newly created/modified Files for the last 6 hours------" | tee -a $outfile
                FILES=$(find $(readlink -f $1) -type f -mmin -360 -exec ls -ltrh {} \+)    ##find all the files that are modified in last 6 hours
                if [ -z "$FILES" ];then
                        echo "None of the files are created/modified during this time period"
                else
                        echo "$FILES" | tee -a $outfile
                fi
                printline;
        else
                echo "Directory doesn't exist" | tee -a $outfile
        fi
else
        nullcheck;
fi
  • `find ... -exec ...` starts a process for every file found. In many cases it is substantially faster to collect the list of hits and process that at once. As an example: `find . -name '*.log' -exec rm -f {} \;` takes ages; `find . -name '*.log' -print | xargs rm -f` is fast. – Ronald Oct 18 '17 at 07:54
  • You might want to consider a file indexer, some of them listed at https://unix.stackexchange.com/a/31126. However, that depends on how often files change on your disk... – urban Oct 18 '17 at 07:59
  • The repeated `| tee -a $outfile` are unnecessarily reopening the destination file every time you want to say something. `if condition; then : ... things; fi | tee -a "$outfile"` outside the main conditional does the redirection only once, and also simplifies your code. (Notice also how to properly quote the variable; it's not a problem now, but if you were to choose a name which requires quoting, you would get odd errors.) Similarly, `"$1"` should be quoted everywhere. – tripleee Oct 18 '17 at 08:23
  • Also maybe you want to discover `printf` and/or here documents instead of the clumsy repeated `echo` statements for any output spanning more than one line. (As a design choice, not printing everything over multiple lines would seem more elegant, and saves screen space for the user.) – tripleee Oct 18 '17 at 08:25
  • Saving output into a variable just so you can `echo` that variable is also extremely inefficient. Just let the output go to standard output; maybe run it through `grep ^` to get an exit code to indicate whether there was actually any output (though most commands will tell you through their own exit code whether they did something -- `find` is a pesky exception). – tripleee Oct 18 '17 at 08:26

0 Answers0