Quick summary: how to hash the contents of an entire folder, or compare two folders for equality
# 1. How to get a sha256 hash over all file contents in a folder, including
# hashing over the relative file paths within that folder to check the
# filenames themselves (get this bash function below).
sha256sum_dir "path/to/folder"
# 2. How to quickly compare two folders (get the `diff_dir` bash function below)
diff_dir "path/to/folder1" "path/to/folder2"
# OR:
diff -r -q "path/to/folder1" "path/to/folder2"
The "one liners"
Do this instead of the main answer, to get a single hash for all non-directory file contents within an entire folder, no matter where the folder is located:
This is a "1-line" command. Copy and paste the whole thing to run it all at once:
# This one works, but don't use it, because its hash output does NOT
# match that of my `sha256sum_dir` function. I recommend you use
# the "1-liner" just below, therefore, instead.
time ( \
starting_dir="$(pwd)" \
&& target_dir="path/to/folder" \
&& cd "$target_dir" \
&& find . -not -type d -print0 | sort -zV \
| xargs -0 sha256sum | sha256sum; \
cd "$starting_dir"
)
However, that produces a slightly different hash than my sha256sum_dir
bash function, which I present below, produces. So, to get the output hash to exactly match the output from my sha256sum_dir
function, do this instead:
# Use this one, as its output matches that of my `sha256sum_dir`
# function exactly.
all_hashes_str="$( \
starting_dir="$(pwd)" \
&& target_dir="path/to/folder" \
&& cd "$target_dir" \
&& find . -not -type d -print0 | sort -zV | xargs -0 sha256sum \
)"; \
cd "$starting_dir"; \
printf "%s" "$all_hashes_str" | sha256sum
For more on why the main answer doesn't produce identical hashes for identical folders in different locations, see further below.
[My preferred method] Here are some bash functions I wrote: sha256sum_dir
and diff_dir
Place the following functions in your ~/.bashrc
file or in your ~/.bash_aliases
file, assuming your ~/.bashrc
file sources the ~/.bash_aliases
file like this:
if [ -f ~/.bash_aliases ]; then
. ~/.bash_aliases
fi
You can find both of the functions below in my personal ~/.bash_aliases
file in my eRCaGuy_dotfiles repo.
Here is the sha256sum_dir
function, which obtains a total "directory" hash of all files in the directory:
# Take the sha256sum of all files in an entire dir, and then sha256sum that
# entire output to obtain a _single_ sha256sum which represents the _entire_
# dir.
# See:
# 1. [my answer] https://stackoverflow.com/a/72070772/4561887
sha256sum_dir() {
return_code="$RETURN_CODE_SUCCESS"
if [ "$#" -eq 0 ]; then
echo "ERROR: too few arguments."
return_code="$RETURN_CODE_ERROR"
fi
# Print help string if requested
if [ "$#" -eq 0 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
# Help string
echo "Obtain a sha256sum of all files in a directory."
echo "Usage: ${FUNCNAME[0]} [-h|--help] <dir>"
return "$return_code"
fi
starting_dir="$(pwd)"
target_dir="$1"
cd "$target_dir"
# See my answer: https://stackoverflow.com/a/72070772/4561887
filenames="$(find . -not -type d | sort -V)"
IFS=$'\n' read -r -d '' -a filenames_array <<< "$filenames"
time all_hashes_str="$(sha256sum "${filenames_array[@]}")"
cd "$starting_dir"
echo ""
echo "Note: you may now call:"
echo "1. 'printf \"%s\n\" \"\$all_hashes_str\"' to view the individual" \
"hashes of each file in the dir. Or:"
echo "2. 'printf \"%s\" \"\$all_hashes_str\" | sha256sum' to see that" \
"the hash of that output is what we are using as the final hash" \
"for the entire dir."
echo ""
printf "%s" "$all_hashes_str" | sha256sum | awk '{ print $1 }'
return "$?"
}
# Note: I prefix this with my initials to find my custom functions easier
alias gs_sha256sum_dir="sha256sum_dir"
Assuming you just want to compare two directories for equality, you can use diff -r -q "dir1" "dir2"
instead, which I wrapped in this diff_dir
command. I learned about the diff
command to compare entire folders here: how do I check that two folders are the same in linux.
# Compare dir1 against dir2 to see if they are equal or if they differ.
# See:
# 1. How to `diff` two dirs: https://stackoverflow.com/a/16404554/4561887
diff_dir() {
return_code="$RETURN_CODE_SUCCESS"
if [ "$#" -eq 0 ]; then
echo "ERROR: too few arguments."
return_code="$RETURN_CODE_ERROR"
fi
# Print help string if requested
if [ "$#" -eq 0 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
echo "Compare (diff) two directories to see if dir1 contains the same" \
"content as dir2."
echo "NB: the output will be **empty** if both directories match!"
echo "Usage: ${FUNCNAME[0]} [-h|--help] <dir1> <dir2>"
return "$return_code"
fi
dir1="$1"
dir2="$2"
time diff -r -q "$dir1" "$dir2"
return_code="$?"
if [ "$return_code" -eq 0 ]; then
echo -e "\nDirectories match!"
fi
# echo "$return_code"
return "$return_code"
}
# Note: I prefix this with my initials to find my custom functions easier
alias gs_diff_dir="diff_dir"
Here is the output of my sha256sum_dir
command on my ~/temp2
dir (which dir I describe just below so you can reproduce it and test this yourself). You can see the total folder hash is b86c66bcf2b033f65451e8c225425f315e618be961351992b7c7681c3822f6a3
in this case:
$ gs_sha256sum_dir ~/temp2
real 0m0.007s
user 0m0.000s
sys 0m0.007s
Note: you may now call:
1. 'printf "%s\n" "$all_hashes_str"' to view the individual hashes of each
file in the dir. Or:
2. 'printf "%s" "$all_hashes_str" | sha256sum' to see that the hash of that
output is what we are using as the final hash for the entire dir.
b86c66bcf2b033f65451e8c225425f315e618be961351992b7c7681c3822f6a3
Here is the cmd and output of diff_dir
to compare two dirs for equality. This is checking that copying an entire directory to my SD card just now worked correctly. I made the output indicate Directories match!
whenever that is the case!:
$ gs_diff_dir "path/to/sd/card/tempdir" "/home/gabriel/tempdir"
real 0m0.113s
user 0m0.037s
sys 0m0.077s
Directories match!
Why the main answer doesn't produce identical hashes for identical folders in different locations
I tried the most-upvoted answer here, and it doesn't work quite right as-is. It needs a little tweaking. It doesn't work quite right because the hash changes based on the folder-of-interest's base path! That means that an identical copy of some folder will have a different hash than the folder it was copied from even if the two folders are perfect matches and contain exactly the same content! That kind of defeats the purpose of taking a hash of the folder if the hashes of two identical folders differ! Let me explain:
Assume I have a folder named temp2
at ~/temp2
. It contains file1.txt
, file2.txt
, and file3.txt
. file1.txt
contains the letter a
followed by a return, file2.txt
contains a letter b
followed by a return, and file3.txt
contains a letter c
followed by a return.
If I run find /home/gabriel/temp2
, I get:
$ find /home/gabriel/temp2
/home/gabriel/temp2
/home/gabriel/temp2/file3.txt
/home/gabriel/temp2/file1.txt
/home/gabriel/temp2/file2.txt
If I forward that to sha256sum
(in place of sha1sum
) in the same pattern as the main answer states, I get this. Notice it has the full path after each hash, which is not what we want:
$ find /home/gabriel/temp2 -type f -print0 | sort -z | xargs -0 sha256sum
87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7 /home/gabriel/temp2/file1.txt
0263829989b6fd954f72baaf2fc64bc2e2f01d692d4de72986ea808f6e99813f /home/gabriel/temp2/file2.txt
a3a5e715f0cc574a73c3f9bebb6bc24f32ffd5b67b387244c2c909da779a1478 /home/gabriel/temp2/file3.txt
If you then pipe that output string above to sha256sum
again, it hashes the file hashes with their full file paths, which is not what we want! The file hashes may match in a folder and in a copy of that folder exactly, but the absolute paths do NOT match exactly, so they will produce different final hashes since we are hashing over the full file paths as part of our single, final hash!
Instead, what we want is the relative file path next to each hash. To do that, you must first cd
into the folder of interest, and then run the hash command over all files therein, like this:
cd "/home/gabriel/temp2" && find . -type f -print0 | sort -z | xargs -0 sha256sum
Now, I get this. Notice the file paths are all relative now, which is what I want!:
$ cd "/home/gabriel/temp2" && find . -type f -print0 | sort -z | xargs -0 sha256sum
87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7 ./file1.txt
0263829989b6fd954f72baaf2fc64bc2e2f01d692d4de72986ea808f6e99813f ./file2.txt
a3a5e715f0cc574a73c3f9bebb6bc24f32ffd5b67b387244c2c909da779a1478 ./file3.txt
Good. Now, if I hash that entire output string, since the file paths are all relative in it, the final hash will match exactly for a folder and its copy! In this way, we are hashing over the file contents and the file names within the directory of interest, to get a different hash for a given folder if either the file contents are different or the filenames are different, or both.