9

I have a folder with over 100,000 files, all numbered with the same stub, but without leading zeros, and the numbers aren't always contiguous (usually they are, but there are gaps) e.g:

file-21.png, 
file-22.png,  
file-640.png, 
file-641.png, 
file-642.png, 
file-645.png, 
file-2130.png, 
file-2131.png, 
file-3012.png, 

etc.

I would like to batch process this to create padded, contiguous files. e.g:

file-000000.png, 
file-000001.png, 
file-000002.png, 
file-000003.png, 

When I parse the folder with for filename in os.listdir('.'): the files don't come up in the order I'd like to them to. Understandably they come up

 file-1, 
 file-1x, 
 file-1xx, 
 file-1xxx,

etc. then

 file-2, 
 file-2x, 
 file-2xx, 

etc. How can I get it to go through in the order of the numeric value? I am a complete python noob, but looking at the docs i'm guessing I could use map to create a new list filtering out only the numerical part, and then sort that list, then iterate that? With over 100K files this could be heavy. Any tips welcome!

Blorgbeard
  • 101,031
  • 48
  • 228
  • 272
memo
  • 3,554
  • 4
  • 31
  • 36
  • You can run a linux "ls" command with any number of parameters to sort them how you want... and then use this list to get to the files. – bwawok Jun 20 '10 at 00:51
  • 1
    Yea, if I were doing this, I'd just use `sort -n`. – David Wolever Jun 20 '10 at 00:58
  • Instead of editing your question with the answer, it's better to just post your solution as its own answer at the bottom and mark it as accepted. – Sophie Alpert Jun 20 '10 at 05:04

7 Answers7

8
import re
thenum = re.compile('^file-(\d+)\.png$')

def bynumber(fn):
  mo = thenum.match(fn)
  if mo: return int(mo.group(1))

allnames = os.listdir('.')
allnames.sort(key=bynumber)

Now you have the files in the order you want them and can loop

for i, fn in enumerate(allnames):
  ...

using the progressive number i (which will be 0, 1, 2, ...) padded as you wish in the destination-name.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Maybe quicker sort function is def bynumber(fn): return int(filter(str.isdigit, fn)) – twneale Jun 20 '10 at 20:53
  • Yep, if you're sure there are no "stray" digits anywhere it's faster (my RE-based solution also checks, and the check's pure overhead if one "knows" it always succeeds every time;-). – Alex Martelli Jun 20 '10 at 21:16
4

There are three steps. The first is getting all the filenames. The second is converting the filenames. The third is renaming them.

If all the files are in the same folder, then glob should work.

import glob
filenames = glob.glob("/path/to/folder/*.txt")

Next, you want to change the name of the file. You can print with padding to do this.

>>> filename = "file-338.txt"
>>> import os
>>> fnpart = os.path.splitext(filename)[0]
>>> fnpart
'file-338'
>>> _, num = fnpart.split("-")
>>> num.rjust(5, "0")
'00338'
>>> newname = "file-%s.txt" % num.rjust(5, "0")
>>> newname
'file-00338.txt'

Now, you need to rename them all. os.rename does just that.

os.rename(filename, newname)

To put it together:

for filename in glob.glob("/path/to/folder/*.txt"): # loop through each file
    newname = make_new_filename(filename) # create a function that does step 2, above
    os.rename(filename, newname)
Ryan Ginstrom
  • 13,915
  • 5
  • 45
  • 60
4

Thank you all for your suggestions, I will try them all to learn the different approaches. The solution I went for is based on using a natural sort on my filelist, and then iterating that to rename. This was one of the suggested answers but for some reason it has disappeared now so I cannot mark it as accepted!

import os
files = os.listdir('.')
natsort(files)
index = 0
for filename in files:
    os.rename(filename, str(index).zfill(7)+'.png')
    index += 1

where natsort is defined in http://code.activestate.com/recipes/285264-natural-string-sorting/

memo
  • 3,554
  • 4
  • 31
  • 36
1

Why don't you do it in a two step process. Parse all the files and rename with padded numbers and then run another script that takes those files, which are sorted correctly now, and renames them so they're contiguous?

  • The renaming operation, a system call, will be the bottleneck: doing twice as many of those will take twice as long. See my answer for a fast way of doing it (with a single rename per file). – Alex Martelli Jun 20 '10 at 01:05
  • 1
    You would rename them in memory, you wouldn't write them back to disk that way. So only one write. – Ed S. Jun 20 '10 at 01:28
0

1) Take the number in the filename. 2) Left-pad it with zeros 3) Save name.

thomasfedb
  • 5,990
  • 2
  • 37
  • 65
0
def renamer():
    for iname in os.listdir('.'):
        first, second = iname.replace(" ", "").split("-")
        number, ext = second.split('.')
        first, number, ext = first.strip(), number.strip(), ext.strip()
        number = '0'*(6-len(number)) + number  # pad the number to be 7 digits long
        oname = first + "-" + number + '.' + ext
        os.rename(iname, oname)
    print "Done"

Hope this helps

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • thanks, from what I can understand, this will only pad the existing numbers and not make the sequence contiguous without gaps? – memo Jun 21 '10 at 12:04
0

The simplest method is given below. You can also modify for recursive search this script.

  1. use os module.
  2. get filenames
  3. os.rename

import os


class Renamer:
    def __init__(self, pattern, extension):
        self.ext = extension
        self.pat = pattern
        return

    def rename(self):
        p, e = (self.pat, self.ext)
        number = 0
        for x in os.listdir(os.getcwd()):
            if str(x).endswith(f".{e}") == True:
                os.rename(x, f'{p}_{number}.{e}')
                number+=1
        return


if __name__ == "__main__":
    pattern = "myfile"
    extension = "txt"
    r = Renamer(pattern=pattern, extension=extension)
    r.rename()