3

In practicing bash, I tried writing a script that searches the home directory for duplicate files in the home directory and deletes them. Here's what my script looks like now.

#!/bin/bash

# create-list: create a list of regular files in a directory

declare -A arr1 sumray origray

if [[ -d "$HOME/$1" && -n "$1" ]]; then
    echo "$1 is a directory"
else
    echo "Usage: create-list Directory | options" >&2
    exit 1
fi

for i in $HOME/$1/*; do
    [[ -f $i ]] || continue
    arr1[$i]="$i"
done

for i in "${arr1[@]}"; do
    Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
    dupe=$(find ~ -name "${Name##*/}" ! -wholename "$Name")

    if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
        mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name")
        origray[$i]=$(md5sum "$i" | cut -c 1-32)
    fi
done

for i in "${!sumray[@]}"; do
    poten=$(md5sum "$i" | cut -c 1-32)
    for i in "${!origray[@]}"; do
        if [[ "$poten" = "${origray[$i]}" ]]; then
            echo "${sumray[$i]} is a duplicate of $i"
        fi
    done
done

Originally, where mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name") is now, my line was the following:

sumray["$i"]=$(find ~ -name "${Name##*/}" ! -wholename "$Name")

This saved the output of find to the array. But I had an issue. If a single file had multiple duplicates, then all locations found by find would be saved to a single value. I figured I could use the mapfile command to fix this, but now it's not saving anything to my array at all. Does it have to do with the fact that I'm using an associative array? Or did I just mess up elsewhere?

Alphatron
  • 83
  • 2
  • 10
  • You're right that `mapfile` / `readarray` do not automatically handle associative arrays (key value pairs). But you can use a workaround by reading in `key=value` lines, then looping those to load the associative array. See this answer for an example: https://stackoverflow.com/questions/25251353/bash4-read-file-into-associative-array/25251400#25251400 – wisbucky Aug 06 '19 at 21:30

1 Answers1

3

I'm not sure if I'm allowed to answer my own question, but I figured that I should post how I solved my problem.

As it turns out, the mapfile command does not work on associative arrays at all. So my fix was to save the output of find to a text file and then store that information in an indexed array. I tested this a few times and I haven't seemed to encounter any errors yet.

Here's my finished script.

#!/bin/bash

# create-list: create a list of regular files in a directory

declare -A arr1 origray
declare indexray

#Verify that Parameter is a directory.
if [[ -d "$HOME/$1/" && -n "$1" ]]; then
    echo "Searching for duplicates of files in $1"
else
    echo "Usage: create-list Directory | options" >&2
    exit 1
fi

#create list of files in specified directory
for i in $HOME/${1%/}/*; do
    [[ -f $i ]] || continue
    arr1[$i]="$i"
done

#search for all duplicate files in the home directory
#by name
#find checksum of files in specified directory
for i in "${arr1[@]}"; do
    Name=$(sed 's/[][?*]/\\&/g' <<< "$i")

    if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
        find ~ -name "${Name##*/}" ! -wholename "$Name" >> temp.txt
        origray[$i]=$(md5sum "$i" | cut -c 1-32)
    fi
done

#create list of duplicate file locations.
if [[ -f temp.txt ]]; then
    mapfile -t indexray < temp.txt
else
    echo "No duplicates were found."
    exit 0
fi

#compare similarly named files by checksum and delete duplicates
count=0
for i in "${!indexray[@]}"; do
    poten=$(md5sum "${indexray[$i]}" | cut -c 1-32)
    for i in "${!origray[@]}"; do
        if [[ "$poten" = "${origray[$i]}" ]]; then
            echo "${indexray[$count]} is a duplicate of a file in $1."
        fi
    done
    count=$((count+1))
done

rm temp.txt

This is kind of sloppy but it does what it's supposed to do. md5sum may not be the optimal way to check for file duplicates but it works. All I have to do is replace echo "${indexray[$count]} is a duplicate of a file in $1." with rm -i ${indexray[$count]} and it's good to go.

So my next question would have to be...why doesn't mapfile work with associative arrays?

Alphatron
  • 83
  • 2
  • 10
  • In bash, an array element cannot be an array (whether the array is indexed or associative). – rici Oct 27 '16 at 22:49
  • @rici So mapfile can work on associative arrays, then? I know that array elements cannot be arrays in bash, but I don't see how my original post caused that array. – Alphatron Oct 28 '16 at 02:10
  • 1
    You had `mapfile -t sumray["$i"]`; I don't see a way of interpreting that other than "read lines into an array which is stored in the element `$i` of `sumray`. But an element of an associative array must be a scalar, so that cannot work. (If your intention was to make `sumray` a multimap, that won't work eithet because bash doesn't have multimaps.) – rici Oct 28 '16 at 02:39
  • 4
    And in answer to the original question: mapfile stores successive input lines into successive elements of an array. Associative arrays don't have successive elements, and mapfile (afaik) won't try to fake them. – rici Oct 28 '16 at 02:44
  • Ah! Thank you. That actually explains a lot. Looks like I misunderstood what I was doing with the associative array in the first place and how mapfile worked. – Alphatron Oct 28 '16 at 07:26