This is an old question, but a common problem lacking an evidence-based solution.
awk '{print "[ -e "$1" ] && echo "$2}' | parallel # 400 files/s
awk '{print "[ -e "$1" ] && echo "$2}' | bash # 6000 files/s
while read file; do [ -e $file ] && echo $file; done # 12000 files/s
xargs find # 200000 files/s
parallel --xargs find # 250000 files/s
xargs -P2 find # 400000 files/s
xargs -P96 find # 800000 files/s
xargs -P96 stat --format "%n" # ~5x as fast! (Added 2023-06-06)
I tried this on a few different systems and the results were not consistent, but xargs -P (parallel execution) was consistently the fastest. I was surprised that xargs -P was faster than GNU parallel (not reported above, but sometimes much faster), and I was surprised that parallel execution helped so much — I thought that file I/O would be the limiting factor and parallel execution wouldn't matter much.
Also noteworthy is that xargs find
is about 20x faster than the accepted solution, and much more concise. For example, here is a rewrite of OP's script:
#!/bin/bash
total=$(wc -l ~/Scripts/ourphotos.txt | awk '{print $1}')
# tr '\n' '\0' | xargs -0 handles spaces and other funny characters in filenames
found=$(cat ~/Scripts/ourphotos.txt | tr '\n' '\0' | xargs -0 -P4 find | wc -l)
if [ $total -ne $found ]; then
ii=$(expr $total - $found)
notify-send -u critical $ii" files have been deleted or are unreadable"
fi
UPDATE 2023-06-06: as @RARE-Kpop-Manifesto suggests, stat is faster than find. For example, with 8 cores, I found
cat files.list | tr '\n' '\0' | xargs -0 -P8 stat --format "%n"
to be about 5 times faster than
cat files.list | tr '\n' '\0' | xargs -0 -P8 find