enumerating numbers in strings

Question

This seems like a problem that would have a fairly straight-forward answer; sadly, I am not that fluent in Python as I'm still learning, and I have not been able to find anything helpful on Google.

My goal is to enumerate the numbers in a string based on how much padding that number already has. I think the best way to describe it is with an example:

0-file will be enumerated from 0-file to 9-file
but 000-file will be enumerated from 000-file to 999-file.

ultimately i want to be able to do this for [number][a-z], [a-z][number], and [a-z][number].* (so something like file10name.so wouldn't match), however I think I can figure that part out myself with regex later on.

So, the question boils down to this:

how do I get the length of 'padding' in the file?
how do I identify where in the string this number is, so I can replace it?
how do I add the padding when I'm iterating (I'm assuming zfill, but I'm interested if there's a better method).

quick edit: yes, the 'psudo regex' is just that. it was to get the concept conveyed, hence why it wouldn't match things like "-". padding would always be a number, not necessarily 0, but thats alright. both answers this received so far are perfect. i can adapt them to my needs. Im already handeling full paths, but its great to have that there for other people who see this in the future. thanks everyone :)

Would the "padding" letter always be `'a'`? Does the "padding" always have to be at the start of the filename? What if there are other `'a'`s or `'0'`s in the filename "template"? Does your "padding" have to be e.g. `'0'`, or could you use special characters like the `'{}'` used in string formatting? — jonrsharpe, Jun 18 '14 at 15:33
Other relevant questions: http://stackoverflow.com/q/20926491/3001761, http://stackoverflow.com/q/23709247/3001761 — jonrsharpe, Jun 18 '14 at 15:34
@jonrsharpe Why did you remove the `.*` in one of the OP regex examples? — Bakuriu, Jun 18 '14 at 15:34
@Bakuriu none of the other examples included the file extension, so I assumed it could be ignored for most purposes. — jonrsharpe, Jun 18 '14 at 15:35

Dan Lenski · Accepted Answer · 2014-06-18T18:26:30.243

You should figure out a correct specification for the files you're trying to match before coding it up. The pseudo-regexps you gave for the filenames you are trying to match ("[number][a-z] or [a-z][number]") don't even include the examples you gave, such as 0-file.

Simple version

However, taking your stated specification at face value, assuming you wish to include uppercase Latin letters as well, here's a simple function that will match [number][a-z] or [a-z][number], and return the appropriate prefix, suffix, and number of numeric digits.

import re

def find_number_in_filename(fn):
    m = re.match(r"(\d+)([A-Za-z]+)$", fn)
    if m:
        prefix, suffix, num_length = "", m.group(2), len(m.group(1))
        return prefix, suffix, num_length

    m = re.match(r"([A-Za-z]+)(\d+)$", fn)
    if m:
        prefix, suffix, num_length = m.group(1), "", len(m.group(2))
        return prefix, suffix, num_length

    return fn, "", 0

example_fn = ("000foo", "bar14", "baz0", "file10name")
for fn in example_fn:
    prefix, suffix, num_length = find_number_in_filename(fn)
    if num_length == 0:
        print "%s: does not match" % fn
    else:
        print "%s -> %s[%d-digits]%s" % (fn, prefix, num_length, suffix)

        all_numbered_versions = [("%s%0"+str(num_length)+"d%s") % (prefix, ii, suffix) for ii in range(0,10**num_length)]
        print "\t", all_numbered_versions[0], "through", all_numbered_versions[-1]

The output will be:

000foo -> [3-digits]foo
    000foo through 999foo
bar14 -> bar[2-digits]
    bar00 through bar99
baz0 -> baz[1-digits]
    baz0 through baz9
file10name: does not match

Notice that I'm using a standard printf-style string format to convert numbers to 0-padded strings, e.g. %03d for 3-digit numbers with 0-padding. Using the newer str.format may be preferable for future-proofing.

Handle full paths and extensions gracefully

If your input includes full paths and filenames with extensions (e.g. /home/someone/project/foo000.txt) and you want to match based on the last piece of the path only, then use os.path.split and .splitext to do the trick.

UPDATE: fixed missing path separator

import re
import os.path

def find_number_in_filename(path):
    # remove the path and the extension
    head, tail = os.path.split(path)
    head = os.path.join(head, "") # include / or \ on the end of head if it's missing
    fn, ext = os.path.splitext(tail)

    m = re.match(r"(\d+)([A-Za-z]+)$", fn)
    if m:
        prefix, suffix, num_length = head, m.group(2)+ext, len(m.group(1))
        return prefix, suffix, num_length

    m = re.match(r"([A-Za-z]+)(\d+)$", fn)
    if m:
        prefix, suffix, num_length = head+m.group(1), ext, len(m.group(2))
        return prefix, suffix, num_length

    return path, "", 0

example_paths = ("/tmp/bar14.so", "/home/someone/0000baz.txt", "/home/someone/baz00bar.zip")
for path in example_paths:
    prefix, suffix, num_length = find_number_in_filename(path)
    if num_length == 0:
        print "%s: does not match" % path
    else:
        print "%s -> %s[%d-digits]%s" % (path, prefix, num_length, suffix)

        all_numbered_versions = [("%s%0"+str(num_length)+"d%s") % (prefix, ii, suffix) for ii in range(0,10**num_length)]
        print "\t", all_numbered_versions[0], "through", all_numbered_versions[-1]

@user3669443, good catch on the missing path separator (`/` or `\\`) in the second version. My bad. Your suggested edit wasn't a correct solution for this, since it added the path separator in the wrong location in some cases. I made a subsequent edit that should get it right. — Dan Lenski, Jun 18 '14 at 18:46

score 0 · Answer 2 · answered Jun 18 '14 at 15:53

Here is a generator implementation, based on str.lstrip and str.format. It parses the input into a standard string template (e.g. '{0:02d}-file'), then loops over the appropriate values and uses that template to create the output:

def process(s):
    zeros = len(s) - len(s.lstrip('0'))
    template = "{{0:0{0}d}}{1}".format(zeros, s.lstrip('0')) 
    for i in range(10**zeros):
        yield template.format(i)

Example usage:

>>> list(process('00-file'))
['00-file', '01-file', '02-file', ..., '98-file', '99-file']

It has the following limitations:

Only supports '0' padding; and
Only supports leading padding;

but you can take this and adapt it to your own purposes.

enumerating numbers in strings

2 Answers2

Simple version

Handle full paths and extensions gracefully