using regular expression substitution command to insert leading zeros in front of numbers less than 10 in a string of filenames

Question

I am having trouble figuring out how to make this work with substitution command, which is what I have been instructed to do. I am using this text as a variable:

text = 'file1, file2, file10, file20'

I want to search the text and substitute in a zero in front of any numbers less than 10. I thought I could do and if statement depending on whether or not re.match or findall would find only one digit after the text, but I can't seem to execute. Here is my starting code where I am trying to extract the string and digits into groups, and only extract the those file names with only one digit:

import re
text = 'file1, file2, file10, file20'
mtch = re.findall('^([a-z]+)(\d{1})$',text)

but it doesn't work

Ashwini Chaudhary · Answer 1 · 2013-08-12T16:19:53.233

3

You can use re.sub with str.zfill:

>>> text = 'file1, file2, file10, file20'
>>> re.sub(r'(\d+)', lambda m : m.group(1).zfill(2), text)
'file01, file02, file10, file20'
#or
>>> re.sub(r'([a-z]+)(\d+)', lambda m : m.group(1)+m.group(2).zfill(2), text)
'file01, file02, file10, file20'

edited Aug 12 '13 at 16:19

answered Aug 12 '13 at 16:13

Ashwini Chaudhary

244,495
58
464
504

thanks! but what i have other filenames in my string, like file100? i only want one leading zero – kflaw Aug 12 '13 at 17:06

score 2 · Accepted Answer · answered Aug 12 '13 at 16:16

2

You can use:

re.sub('[a-zA-Z]\d,', lambda x: x.group(0)[0] + '0' + x.group(0)[1:], s)

answered Aug 12 '13 at 16:16

Saullo G. P. Castro

56,802
26
179
234

the search pattern that I used `'[a-zA-Z]\d,'` returns a string of *len()=3*, and the `re.sub()` method allows you to use this string by calling a function as the second argument, which makes it pretty easy to build complex replacemen ts using the values from the matched string. You should [refer here](http://docs.python.org/2/library/re.html) for more details and examples... – Saullo G. P. Castro Aug 12 '13 at 17:22
its this part i'm not totally clear on: x.group(0)[0] + '0' + x.group(0)[1:] – kflaw Aug 12 '13 at 17:32
for each match it finds one group that can be accessed using `group(0)`, containing three characters `'e1,'`, `'e2,'`, then I use these characters by slicing `[0]-->'e'` and `[1:]-->'1,' or '2,'` to rebuild the string to replace the original... – Saullo G. P. Castro Aug 12 '13 at 17:40
got it! great solution – kflaw Aug 12 '13 at 17:43
so wait, actually, how is it that zeros only get added to the single digit numbers? sorry i am new with python – kflaw Aug 12 '13 at 18:11
the trick is that `[a-zA-Z]\d,` recognized one letter, one digit and the comma. Where you have two digits the match does not occur... – Saullo G. P. Castro Aug 12 '13 at 18:23
last question - what does the comma do in the expression? – kflaw Aug 12 '13 at 18:28
It is just one character... in order to get `e0,` and not `e0 ` (with space), for example... it is ok to ask... :) – Saullo G. P. Castro Aug 12 '13 at 18:30

score 1 · Answer 3 · answered Aug 12 '13 at 16:11

1

Anchors anchor to the beginning and end of strings (or lines, in multi-line mode). What you're looking for are word boundaries. And of course, you don't need the {1} quantifier.

\b([a-z]+)(\d)\b

(Not sure how you plan to use your captures, so I'll leave those alone.)

answered Aug 12 '13 at 16:11

Andrew Cheong

29,362
15
90
145

when i tried this and tried to print mtch, it gave me a blank list: – kflaw Aug 12 '13 at 16:48

score 0 · Answer 4 · answered Aug 12 '13 at 17:33

You have the start and end anchors applied, so the pattern cannot be fully matched.

Try something like this

text = "file1, file2, file3, file4, file10, file20, file100"
print re.sub("(?<=[a-z])\d(?!\d),?", "0\g<0>", text)

will result in

file01, file02, file03, file04, file10, file20, file100

This should work if you have a list like above or a single element name.

Explanation

(?<=[a-z]) - Checks that the previous characters are letters using look behind

\d - matches a single digit

(?!\d) - Checks that there are no more digits using lookahead

,? - allows for an optional comma in the list

0\g<0> - The pattern matches a single digit, so it trivial to add a zero. The \g<0> is the matched group.

using regular expression substitution command to insert leading zeros in front of numbers less than 10 in a string of filenames

4 Answers4

Linked