UPDATE 1 : quick note on short-circuiting
before u even start to consider which of the comparison tools, check their file sizes via stat
command.
if even those don't match, then it's close to statistically impossible for any hashing algorithm to report a matching hash.
One method I use myself is
- check file sizes
- for matching ones, do a high speed batch run of them over
xxhash
.
- then finally, only for the suggested duplicates from step 2, run them once more via a cryptographically acceptable hash, like
Keccak
or Shake
from SHA3
family, to confirm they're truly duplicates beyond all reasonable doubt
==============================
awk
has a useful feature of having NR = 0
if you only have an END
block, but nothing got read in, either from files or from pipe.
One can also re-use the same string for both scenarios w/o using a ternary operator … ? … : …
by leveraging the 0th-power of any base
.
if u don't have too many files, here's a pretty brute force but high speed way to do it all at once.
--just remove the bckgrnd placing bit "... & )" if u wanna do it sequentially
|
for f1 in "${m2p}" "${m3l}" "${m3m}"; do
for f2 in "${m3m}" "${m2a}" "${m2p}" "${m3supp}" ; do
( diff "${f1}" "${f2}" |
{m,g}awk -F'^$' -v __="${f1}" -v ___="${f2}" '
END {
____=ENVIRON["HOME"]
sub(____,"~",__)
sub(____,"~",___)
print "checksums for \n\t\42"(__)"\42\n\t\b\b\b\band \42"\
(___)"\42 ===> " \
substr(" DO NOT match", ((_+= ++_)^++_+!!_)^!NR)"\n" }' &)
done ; done | gcat -b
1 checksums for
2 "~/m2map_main.txt"
3 and "~/m2map_main.txt" ===> match
4 checksums for
5 "~/m3vid_genie26.txt"
6 and "~/m3vid_genie26.txt" ===> match
7 checksums for
8 "~/m3vid_genie26.txt"
9 and "~/m2art_main_03.txt" ===> DO NOT match
10 checksums for
11 "~/m3vid_genie26.txt"
12 and "~/m3vid_genie25_supp.txt" ===> DO NOT match
13 checksums for
14 "~/m23lyricsFLT_05.txt"
15 and "~/m3vid_genie26.txt" ===> DO NOT match
16 checksums for
17 "~/m23lyricsFLT_05.txt"
18 and "~/m2art_main_03.txt" ===> DO NOT match
19 checksums for
20 "~/m23lyricsFLT_05.txt"
21 and "~/m3vid_genie25_supp.txt" ===> DO NOT match
22 checksums for
23 "~/m3vid_genie26.txt"
24 and "~/m2map_main.txt" ===> DO NOT match
25 checksums for
26 "~/m2map_main.txt"
27 and "~/m3vid_genie26.txt" ===> DO NOT match
28 checksums for
29 "~/m2map_main.txt"
30 and "~/m2art_main_03.txt" ===> DO NOT match
31 checksums for
32 "~/m2map_main.txt"
33 and "~/m3vid_genie25_supp.txt" ===> DO NOT match
34 checksums for
35 "~/m23lyricsFLT_05.txt"
36 and "~/m2map_main.txt" ===> DO NOT match