96

I need to match two cases by one reg expression and do replacement

'long.file.name.jpg' -> 'long.file.name_suff.jpg'

'long.file.name_a.jpg' -> 'long.file.name_suff.jpg'

I'm trying to do the following

re.sub('(\_a)?\.[^\.]*$' , '_suff.',"long.file.name.jpg")

But this is cut the extension '.jpg' and I'm getting

long.file.name_suff. instead of long.file.name_suff.jpg I understand that this is because of [^.]*$ part, but I can't exclude it, because I have to find last occurance of '_a' to replace or last '.'

Is there a way to replace only part of the match?

Arty
  • 5,923
  • 9
  • 39
  • 44

6 Answers6

133

Put a capture group around the part that you want to preserve, and then include a reference to that capture group within your replacement text.

re.sub(r'(\_a)?\.([^\.]*)$' , r'_suff.\2',"long.file.name.jpg")
Amber
  • 507,862
  • 82
  • 626
  • 550
  • @Amber: I infer from your answer that unlike str.replace(), we can't use variables a) in raw strings; or b) as an argument to re.sub; or c) both. a) makes sense (I think) but I'm not sure about b). It seems we can use a variable name for the string the regex is going through, though. Would you care to elucidate? Thanks. – Malik A. Rumi Jun 09 '17 at 01:39
  • what are the parts that are capturing and referencing it? – cryanbhu Nov 17 '20 at 16:54
  • 4
    @cryanbhu Anything in brackets becomes a group. Groups are numbered in order of appearance and can subsequently be referenced by a backslash followed by the number. In the example, \2 references the second group. The single backslash is sufficient because putting r before the string has it treated as raw string. Without the preceding r , \\2 would reference the group. In the "Regular expression syntax" documentation of python's re package, the relevant sections are (...) and \number . Furthermore, the \\ business is explained right in the beginning (3rd paragraph, as of today). – ra0 Jun 27 '22 at 13:47
51
 re.sub(r'(?:_a)?\.([^.]*)$', r'_suff.\1', "long.file.name.jpg")

?: starts a non matching group (SO answer), so (?:_a) is matching the _a but not enumerating it, the following question mark makes it optional.

So in English, this says, match the ending .<anything> that follows (or doesn't) the pattern _a

Another way to do this would be to use a lookbehind (see here). Mentioning this because they're super useful, but I didn't know of them for 15 years of doing REs

Community
  • 1
  • 1
Amarghosh
  • 58,710
  • 11
  • 92
  • 121
11

Just put the expression for the extension into a group, capture it and reference the match in the replacement:

re.sub(r'(?:_a)?(\.[^\.]*)$' , r'_suff\1',"long.file.name.jpg")

Additionally, using the non-capturing group (?:…) will prevent re to store to much unneeded information.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
10

You can do it by excluding the parts from replacing. I mean, you can say to the regex module; "match with this pattern, but replace a piece of it".

re.sub(r'(?<=long.file.name)(\_a)?(?=\.([^\.]*)$)' , r'_suff',"long.file.name.jpg")
>>> 'long.file.name_suff.jpg'

long.file.name and .jpg parts are being used on matching, but they are excluding from replacing.

Ahmet DAL
  • 4,445
  • 9
  • 47
  • 71
  • A lookbehind `?<=` only allows for fixed-width patterns. If you have one, this is a good option. – Justin Jan 14 '22 at 14:02
0

I wanted to use capture groups to replace a specific part of a string to help me parse it later. Consider the example below:

s= '<td> <address> 110 SOLANA ROAD, SUITE 102<br>PONTE VEDRA BEACH, FL32082 </address> </td>'

re.sub(r'(<address>\s.*?)(<br>)(.*?\<\/address>)', r'\1 -- \3', s)
##'<td> <address> 110 SOLANA ROAD, SUITE 102 -- PONTE VEDRA BEACH, FL32082 </address> </td>'
grantr
  • 878
  • 8
  • 16
-1
print(re.sub('name(_a)?','name_suff','long.file.name_a.jpg'))
# long.file.name_suff.jpg

print(re.sub('name(_a)?','name_suff','long.file.name.jpg'))
# long.file.name_suff.jpg
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
  • 8
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 03 '21 at 07:18