Firstly, are you sure you want to do this using Python? There are several fully-featured file renaming utilities available which may suit your needs. (My personal favourite is KRename under Linux.)
Assuming you really do want to do this in python...
Q1) Separate file name from its extension
Use os.path.splitext()
to separate a filename into the 'name' and 'extension' parts. You can then manipulate the filename without changing the extension, and recombine them back together when you're done. For example:
import os, pprint
filenames = [f for f in os.listdir('D:\\Freeware')]
name_and_ext_list = [os.path.splitext(f) for f in filenames]
pprint.pprint(filenames)
pprint.pprint(name_and_ext_list)
gives output something like
['a43.zip',
'Amphetype-0.16-win32.exe',
'aMSN-0.98.4-tcl85-windows-installer.exe',
'andlinux-beta2-minimal.exe',
'ATF-Cleaner.exe',
'aurora-setup.exe',
[('a43', '.zip'),
('Amphetype-0.16-win32', '.exe'),
('aMSN-0.98.4-tcl85-windows-installer', '.exe'),
('andlinux-beta2-minimal', '.exe'),
('ATF-Cleaner', '.exe'),
('aurora-setup', '.exe'),
Note that os.path.splitext()
is more robust than anything you are likely to roll yourself. It won't get confused by extra dots in the filename - for example:
>>> os.path.splitext('Zipped Party Food Invoice 22.09.2011.xlsx.zip')
('Zipped Party Food Invoice 22.09.2011.xlsx', '.zip')
Q2) "fuzzy" search and replace
Your example code, replacing all of the characters BCDEF
with A
, can be done using a regular expression as suggested by @kev.
Edit 2 Since you want to replace entire words, not the specific single characters B, C, D, E, F
, you could try something like the code below. This is not particularly efficient (it has to scan through the list of files once for every word you want to search and replace) - it works but improvements welcome. A good solution would only need to make one pass through the strings.
def replace_words ( input_string ):
replacement_lists = { \
"Electronics" : ["Computer", "CD Player", "Camera", "Coffee Grinder"],
"Baked Goods" : ["Cheesecake", "Muffin","Cookie"] }
output_string = input_string
for type_of_thing, list_of_things in replacement_lists.iteritems():
for thing in list_of_things:
output_string = output_string.replace(thing, type_of_thing)
return output_string
input_names = [ \
"Coffee Cup.jpg",
"Computer Disks.docx",
"Muffins.jar",
"CD Player Maintenance.lzma",
"Cookie Monster's 101 Types of Cookie.pdf" ]
output_names = [replace_words(x) for x in input_names]
which gives output like:
>>> pprint.pprint(input_names)
['Coffee Cup.jpg',
'Computer Disks.docx',
'Muffins.jar',
'CD Player Maintenance.lzma',
"Cookie Monster's 101 Types of Cookie.pdf"]
>>> pprint.pprint(output_names)
['Coffee Cup.jpg',
'Electronics Disks.docx',
'Baked Goodss.jar',
'Electronics Maintenance.lzma',
"Baked Goods Monster's 101 Types of Baked Goods.pdf"]
Q3) remove all dashes unless surrounded by word-characters
Again a job for regular expressions. Try:
>>> teststring = "Coca-Cola - A History.pdf"
>>> re.sub(r'(\W)-(\W)',r'\1\2',teststring)
'Coca-Cola A History.pdf'
This will remove any dashes not surrounded by 'word characters' \w
, loosely defined as any alphanumeric character, any number, or underscore. \W
matches non-word characters, i.e. anything not matched by \w
.
1) \w
is locale-dependent: if your filenames are in Russian, then Cyrillic characters will also count as 'word characters'.
Notes:
2) The regular expression actually matches three characters - the \1
and the \2
in the replace string are used to put back the two characters on either side of the dash. (See: "backreferences".)
3) Note the use of raw strings r"..."
instead of normal strings "..."
. This is to prevent Python mangling the backslashes in your regular expressions.
Edit: Here's an (un-tested) example of how to treat the filename separately to the extension. Note that I've moved all the heavy lifting into a separate function instead of using list comprehensions.
List comprehensions are elegant for replacing loops that do one or two things, but I personally find nested list comprehensions quite hard to read. Remember that lines longer than 80 characters long are an indicator of code smell.
import os, shutil, re
def rename_file (original_filename):
name, extension = os.path.splitext(original_filename)
#remove one or more dashes, surrounded by non-word characters.
modified_name = re.sub(r"(\W)-+(\W)",r"\1\2",name)
new_filename = modified_name + extension
try:
# moves files or directories (recursively)
shutil.move(original_filename, new_filename)
except shutil.Error:
print ("Couldn't rename file %(original_filename)s!" % locals())
target_dir = r"/home/trinity/nmap"
targets = os.listdir(target_dir)
[rename_file(f) for f in targets]