I deal with large collection of unknown files, and have been been learning python to help me filter / sort and otherwise wrangle these files.
A collection I am looking at has a large number of resource forks, and I wrote a little script to find them, and delete them (next step is find them, and to move them, but thats for another day).
I found in this collection that there is a number of files that have non ascii characters in the file name, and this seems to be tripping the os.delete function.
Example file name: ._spec com report 395 (N.B. the 3 has a small dot underneath it, I can't find an example, or figure out how to show the hex of the filename...)
I log all the filenames, this is what that log records for that file: ._spec com report 3?95
The error I get is a windowserror, as it can't find the file (the string its passing is not what the file is known as by the windows OS.) I put in a try clause to allow me to work rounf it, but I really like to deal with it properly.
I also tried using a unicode switch in the walk option `os.walk(u'.') as per this post: Handling ascii char in python string (top answer) and I see the following error:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "c:\python27\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\uf022' in position
20: character maps to <undefined>
So I am guessing the answer lies with how the filename is parsed, and wondering if anyone might be able to point in me in the right direction...
code:
import os
import sys
rootdir = "c:\target Dir to walk"
destKeep = "Keepers.txt"
destDelete = "Deleted.txt"
matchingText = "._"
files_removed = 1
for folder, subs, files in os.walk(rootdir):
outfileKeep = open(destKeep,"a")
outfileDelete = open(destDelete,"a")
for filename in files:
matchScore = filename.find(matchingText)
src = os.path.join(folder, filename)
srcNewline = src + ", " + str(filename) + "\n"
if matchScore == -1:
outfileKeep.writelines(srcNewline)
else:
outfileDelete.writelines(srcNewline)
try:
os.remove(src)
except WindowsError:
print "I was unable to delete this file:"
outfileKeep.writelines(srcNewline)
files_removed += 1
if files_removed:
print '%d files removed' % files_removed
else :
print 'No files removed'
outfileKeep.close()
outfileDelete.close()