5

I am having trouble figuring out how to make this work with substitution command, which is what I have been instructed to do. I am using this text as a variable:

text = 'file1, file2, file10, file20'

I want to search the text and substitute in a zero in front of any numbers less than 10. I thought I could do and if statement depending on whether or not re.match or findall would find only one digit after the text, but I can't seem to execute. Here is my starting code where I am trying to extract the string and digits into groups, and only extract the those file names with only one digit:

import re
text = 'file1, file2, file10, file20'
mtch = re.findall('^([a-z]+)(\d{1})$',text)

but it doesn't work

ekad
  • 14,436
  • 26
  • 44
  • 46
kflaw
  • 424
  • 1
  • 10
  • 26

4 Answers4

3

You can use re.sub with str.zfill:

>>> text = 'file1, file2, file10, file20'
>>> re.sub(r'(\d+)', lambda m : m.group(1).zfill(2), text)
'file01, file02, file10, file20'
#or
>>> re.sub(r'([a-z]+)(\d+)', lambda m : m.group(1)+m.group(2).zfill(2), text)
'file01, file02, file10, file20'
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • thanks! but what i have other filenames in my string, like file100? i only want one leading zero – kflaw Aug 12 '13 at 17:06
2

You can use:

re.sub('[a-zA-Z]\d,', lambda x: x.group(0)[0] + '0' + x.group(0)[1:], s)
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • the search pattern that I used `'[a-zA-Z]\d,'` returns a string of *len()=3*, and the `re.sub()` method allows you to use this string by calling a function as the second argument, which makes it pretty easy to build complex replacemen ts using the values from the matched string. You should [refer here](http://docs.python.org/2/library/re.html) for more details and examples... – Saullo G. P. Castro Aug 12 '13 at 17:22
  • its this part i'm not totally clear on: x.group(0)[0] + '0' + x.group(0)[1:] – kflaw Aug 12 '13 at 17:32
  • for each match it finds one group that can be accessed using `group(0)`, containing three characters `'e1,'`, `'e2,'`, then I use these characters by slicing `[0]-->'e'` and `[1:]-->'1,' or '2,'` to rebuild the string to replace the original... – Saullo G. P. Castro Aug 12 '13 at 17:40
  • got it! great solution – kflaw Aug 12 '13 at 17:43
  • so wait, actually, how is it that zeros only get added to the single digit numbers? sorry i am new with python – kflaw Aug 12 '13 at 18:11
  • the trick is that `[a-zA-Z]\d,` recognized one letter, one digit and the comma. Where you have two digits the match does not occur... – Saullo G. P. Castro Aug 12 '13 at 18:23
  • last question - what does the comma do in the expression? – kflaw Aug 12 '13 at 18:28
  • It is just one character... in order to get `e0,` and not `e0 ` (with space), for example... it is ok to ask... :) – Saullo G. P. Castro Aug 12 '13 at 18:30
1

Anchors anchor to the beginning and end of strings (or lines, in multi-line mode). What you're looking for are word boundaries. And of course, you don't need the {1} quantifier.

\b([a-z]+)(\d)\b

(Not sure how you plan to use your captures, so I'll leave those alone.)

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
0

You have the start and end anchors applied, so the pattern cannot be fully matched.

Try something like this

text = "file1, file2, file3, file4, file10, file20, file100"
print re.sub("(?<=[a-z])\d(?!\d),?", "0\g<0>", text)

will result in

file01, file02, file03, file04, file10, file20, file100

This should work if you have a list like above or a single element name.

Explanation

(?<=[a-z]) - Checks that the previous characters are letters using look behind

\d - matches a single digit

(?!\d) - Checks that there are no more digits using lookahead

,? - allows for an optional comma in the list

0\g<0> - The pattern matches a single digit, so it trivial to add a zero. The \g<0> is the matched group.

Kami
  • 19,134
  • 4
  • 51
  • 63