Cleaning old backups, and found some results of something wrong. One user's backup contains files with strange names like "37&@4ez98d". In order to automating the cleaning process I tried to find all such files and did that with such regexp:
find -regextype sed -regex '.*\/[[:digit:]a-z[:punct:]]\{10\}'
All these names are of 10 characters long, and contains digits, small latins and some punctuations. The find
worked almost perfectly, but it also found some files with the "legal" names like 07-709.pdf
. And I can not construct the regexp like "anywhere inside given subtree, 10 characters include digits, small latins and SOME punctuations except for dot
and minus sign
"
I tried everything I could, but I could not make find
to ignore the minuses and dots. These symbols may appear anywhere inside the file name, so I can't rely on their fixed placement. Placing something like [^.]
(in any variations) produced no usable results. Grepping the find's results for dots and minuses is also useless because these symbols may occur in directories' names, and filtering these out may filter out the "bad" filenames also. I can not enumerate all punctuations possible because I can miss something: I have no idea what "alphabet" was used to scramble these names, while I'm pretty sure that it does not contain dots and minuses.
I managed to workaround the problem, pipelining find's output to some additional checking routine (it was one-liner, additional newlines were inserted for readability only):
find -regextype sed -regex '.*\/[[:digit:]a-z[:punct:]]\{10\}'| \
while read a; do \
b=${a: -10}; [[ ! "$b" =~ .*[\-\.]+.* ]] && echo $b \
done
but the trick I need is the single regexp.
Any suggestions please?
Some real data for tesing (four first are to be found, three latter are to be ignored):
rxoxywiy7l
u29t@5%0qd
im^ua&saeo
y6mxn2wnkb
07-709.pdf
3023-7.pdf
18099.docx
Thank you.