Renaming files according to a set of rules

Question

I'm using (or attempting to use) a python script to (at the moment) to remove and alter file names

I'm looking to have a bunch of renaming function all in one script.

Q1

I'd like to know how to do it so do file renames (like removing the '." without having to re-add it for the fileextention at the end.

Q2

I also would like to be able to make a list so I can rename multiple things into the same name without having to have a line for each: Right now it would be:

[os.rename(f, f.replace('B', 'A')) for f in os.listdir('.') if not f.startswith('.')]
[os.rename(f, f.replace('C', 'A')) for f in os.listdir('.') if not f.startswith('.')]
[os.rename(f, f.replace('D', 'A')) for f in os.listdir('.') if not f.startswith('.')]
[os.rename(f, f.replace('E', 'A')) for f in os.listdir('.') if not f.startswith('.')]
[os.rename(f, f.replace('F', 'A')) for f in os.listdir('.') if not f.startswith('.')]

I'd Like to be able to put that into some sort of complete "replace B, C, D, E, and/or F

Q3

My third question is how can I make it so I can have it remove all "-" from the filename unless part of the same is "Coca-Cola" (or something)

At the moment I have a couple lame workarounds. But they aren't efficient

Li-aung Yip · Accepted Answer · 2012-03-07T02:00:42.510

Firstly, are you sure you want to do this using Python? There are several fully-featured file renaming utilities available which may suit your needs. (My personal favourite is KRename under Linux.)

Assuming you really do want to do this in python...

Q1) Separate file name from its extension

Use os.path.splitext() to separate a filename into the 'name' and 'extension' parts. You can then manipulate the filename without changing the extension, and recombine them back together when you're done. For example:

import os, pprint
filenames = [f for f in os.listdir('D:\\Freeware')]
name_and_ext_list = [os.path.splitext(f) for f in filenames]
pprint.pprint(filenames)
pprint.pprint(name_and_ext_list)

gives output something like

['a43.zip',
 'Amphetype-0.16-win32.exe',
 'aMSN-0.98.4-tcl85-windows-installer.exe',
 'andlinux-beta2-minimal.exe',
 'ATF-Cleaner.exe',
 'aurora-setup.exe',

[('a43', '.zip'),
 ('Amphetype-0.16-win32', '.exe'),
 ('aMSN-0.98.4-tcl85-windows-installer', '.exe'),
 ('andlinux-beta2-minimal', '.exe'),
 ('ATF-Cleaner', '.exe'),
 ('aurora-setup', '.exe'),

Note that os.path.splitext() is more robust than anything you are likely to roll yourself. It won't get confused by extra dots in the filename - for example:

>>> os.path.splitext('Zipped Party Food Invoice 22.09.2011.xlsx.zip')
('Zipped Party Food Invoice 22.09.2011.xlsx', '.zip')

Q2) "fuzzy" search and replace

Your example code, replacing all of the characters BCDEF with A, can be done using a regular expression as suggested by @kev.

Edit 2 Since you want to replace entire words, not the specific single characters B, C, D, E, F, you could try something like the code below. This is not particularly efficient (it has to scan through the list of files once for every word you want to search and replace) - it works but improvements welcome. A good solution would only need to make one pass through the strings.

def replace_words ( input_string ):
    replacement_lists = { \
        "Electronics" : ["Computer", "CD Player", "Camera", "Coffee Grinder"],
        "Baked Goods" : ["Cheesecake", "Muffin","Cookie"] }

    output_string = input_string
    for type_of_thing, list_of_things in replacement_lists.iteritems():
        for thing in list_of_things:
            output_string = output_string.replace(thing, type_of_thing)

    return output_string

input_names = [ \
"Coffee Cup.jpg",
"Computer Disks.docx",
"Muffins.jar",
"CD Player Maintenance.lzma",
"Cookie Monster's 101 Types of Cookie.pdf" ]

output_names = [replace_words(x) for x in input_names]

which gives output like:

>>> pprint.pprint(input_names)
['Coffee Cup.jpg',
 'Computer Disks.docx',
 'Muffins.jar',
 'CD Player Maintenance.lzma',
 "Cookie Monster's 101 Types of Cookie.pdf"]
>>> pprint.pprint(output_names)
['Coffee Cup.jpg',
 'Electronics Disks.docx',
 'Baked Goodss.jar',
 'Electronics Maintenance.lzma',
 "Baked Goods Monster's 101 Types of Baked Goods.pdf"]

Q3) remove all dashes unless surrounded by word-characters

Again a job for regular expressions. Try:

>>> teststring = "Coca-Cola - A History.pdf"
>>> re.sub(r'(\W)-(\W)',r'\1\2',teststring)
'Coca-Cola  A History.pdf'

This will remove any dashes not surrounded by 'word characters' \w, loosely defined as any alphanumeric character, any number, or underscore. \W matches non-word characters, i.e. anything not matched by \w.

1) \w is locale-dependent: if your filenames are in Russian, then Cyrillic characters will also count as 'word characters'.

Notes:

2) The regular expression actually matches three characters - the \1 and the \2 in the replace string are used to put back the two characters on either side of the dash. (See: "backreferences".)

3) Note the use of raw strings r"..." instead of normal strings "...". This is to prevent Python mangling the backslashes in your regular expressions.

Edit: Here's an (un-tested) example of how to treat the filename separately to the extension. Note that I've moved all the heavy lifting into a separate function instead of using list comprehensions.

List comprehensions are elegant for replacing loops that do one or two things, but I personally find nested list comprehensions quite hard to read. Remember that lines longer than 80 characters long are an indicator of code smell.

import os, shutil, re
def rename_file (original_filename):
    name, extension = os.path.splitext(original_filename)
    #remove one or more dashes, surrounded by non-word characters.
    modified_name = re.sub(r"(\W)-+(\W)",r"\1\2",name) 
    new_filename = modified_name + extension
    try:
        # moves files or directories (recursively)
        shutil.move(original_filename, new_filename)
    except shutil.Error:
        print ("Couldn't rename file %(original_filename)s!" % locals())

target_dir = r"/home/trinity/nmap"
targets = os.listdir(target_dir)
[rename_file(f) for f in targets]

Wow, a fantastic response!! I'm sorry, but I'm having trouble understanding how to apply renaming to only the names, not the file extensions, and then put it back together. I assume I insert a line at the beginning telling it to: "[os.path.splitext(f) for f in filenames]". But then what do I do for the renames to apply only to the file names and not the extensions? — user, Mar 06 '12 at 20:50
@RobinHood: I've added a more specific example. Note that I haven't done the full range of error/sanity checking required - you'll probably want to check whether `filename` is a file or directory, that it actually exists on disk, etc. — Li-aung Yip, Mar 07 '12 at 01:26
Oh, and that it's not one of the special directories `.` or `..` or a hidden 'dot file' `.ssh`, `.minecraft`, etc. (@kev had this in his answer but I always forget.) — Li-aung Yip, Mar 07 '12 at 01:33
For some reason I'm not able to adapt this last code example to let me do various things with the "root" part of whatever is split by the os.path.split(). I'm getting all sorts of syntax errors with (original_filename) not being defined, for one thing. I'm also looking to have it apply to whatever directory it's being called from (whatever directory I or the user am in). How can I split the two, then have some category I can make a list of replacements for (e.g. replace dots with spaces, capitalize first letter, replace "-" with a space unless it's part of "blu-ray", etc.)? — user, Mar 08 '12 at 17:36
import os, re, pprint teststring = "Coca-Cola - A History.pdf" re.sub(r'(\W)-(\W)',r'\1\2',teststring) pprint.pprint(teststring) also returns 'Coca-Cola - A History.pdf'...I feel I'm doing something incorrectly? — user, Mar 08 '12 at 17:59
Your regular expression is working, but you're not saving the result. `re.sub(mystring)` does not modify `mystring` - instead it returns a *new* string. Try `new_string = re.sub(...); print(new_string);`. — Li-aung Yip, Mar 09 '12 at 04:25
At your problems with last code example: my example is tested working. It implements a function called `rename_file` with argument `original_string`. The only reason `original_string` would be undefined is if the code that calls it is no longer part of the `rename_file` function (did you copy-paste it somewhere else?) Can't say much more without actually seeing your code. — Li-aung Yip, Mar 09 '12 at 04:27
This discussion is getting too long for the comments. Please post a new question and post a link to it here. — Li-aung Yip, Mar 09 '12 at 04:29

score 2 · Answer 2 · answered Mar 06 '12 at 02:40

2

import re
[os.rename(f, re.sub(r'[B-F]', 'A', f)) for f in os.listdir('.') if not f.startswith('.')]

answered Mar 06 '12 at 02:40

kev

155,172
47
273
272

`r'[B-F]'` is a regexp which matches `B,C,D,E,F` – kev Mar 07 '12 at 00:44
You could do a regexp like `r'(Camera|Computer|Coffee Grinder|Alarm Clock)'` which will match any one of those items. The pipe symbol `|` is the regex *alternation* operator - think of it as a logical OR. – Li-aung Yip Mar 07 '12 at 01:29
Edited `Q2` of my answer to better reflect what you actually wanted - to replace entire words, not the literal characters `A, B, C, D, E, F`. – Li-aung Yip Mar 07 '12 at 02:02
Actually, I just remembered [this substantially similar question](http://stackoverflow.com/questions/9295896/match-and-replace-emoticons-in-string-what-is-the-most-efficient-way/9296015#9296015). See @tchrist's answer if you want the optimally efficient solution. – Li-aung Yip Mar 07 '12 at 02:29

Renaming files according to a set of rules

2 Answers2

Linked