Ultimately, I want to get rid of the possibility of duplicate entries showing up my array. The reason I'm doing this is because I'm working on a script that compares two directories, searches for, and deletes duplicate files. The potential duplicate files are stored in an array and the files are only deleted if they have the same name and checksum as the originals. So if there are duplicate entries, I wind up encountering minor errors where md5 either tries to find the checksum of a file that doesn't exist (because it was already deleted) or rm tries to delete a file that was deleted already.
Here's part of the script.
compare()
{
read -p "Please enter two directories: " dir1 dir2
if [[ -d "$dir1" && -d "$dir2" ]]; then
echo "Searching through $dir2 for duplicates of files in $dir1..."
else
echo "Invalid entry. Please enter valid directories." >&2
exit 1
fi
#create list of files in specified directory
while read -d $'\0' file; do
test_arr+=("$file")
done < <(find $dir1 -print0)
#search for all duplicate files in the home directory
#by name
#find checksum of files in specified directory
tmpfile=$(mktemp -p $dir1 del_logXXXXX.txt)
for i in "${test_arr[@]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
if [[ $(find $dir2 -name "${Name##*/}" ! -wholename "$Name") ]]; then
[[ -f $i ]] || continue
find $dir2 -name "${Name##*/}" ! -wholename "$Name" >> $tmpfile
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
#create list of duplicate file locations.
dupe_loc
#compare similarly named files by checksum and delete duplicates
local count=0
for i in "${!indexray[@]}"; do
poten=$(md5sum "${indexray[$i]}" | cut -c 1-32)
for i in "${!origray[@]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${indexray[$count]} is a duplicate of a file in $dir1."
rm -v "${indexray[$count]}"
break
fi
done
count=$((count+1))
done
exit 0
}
dupe_loc
is the following function.
dupe_loc()
{
if [[ -s $tmpfile ]]; then
mapfile -t indexray < $tmpfile
else
echo "No duplicates were found."
exit 0
fi
}
I figure the best way to solve this issue would be to use the sort
and uniq
commands to dispose of duplicate entries in the array. But even with process substitution, I encounter errors when trying to do that.