python regex pattern that matches pattern including a repeated subgroup

Question

I am looking for a pattern that catches and substitutes:

"whatever whatever 1. [document 1] This is a document dealing with"

"whatever whatever 1. This is a document dealing with"

but of course only in the case where both numbers are the same

in general:

"whatever whatever N. [document N] This is a document dealing with"

if it helps N has to be between 1 and 1000 (i.e. max three characters)

import re
mystr = "whatever whatever 1. [document 1] This is a document dealing with"
mystr = re.sub(r'([1-9]+)(\s)?(\.)(\s+)(\[Document )(*****)',r'\1\2\3\4',mystr)
                 ^^^^^^^^                            ^^^^^^

I have to refer in ***** to the first group

I could use:

mystr = re.sub(r'([1-9]+)(\s)?(\.)(\s+)(\[Document )([1-9]+)',r'\1\2\3\4',mystr)

but of course that will inlcude cases like: "whatever whatever 56. [document 877] This is a document dealing with"

I check a bunch of answers with no success: Regex: How to match a string that contains repeated pattern? Capture repeated groups in python regex Capturing repeating subpatterns in Python regex Regex with repeating groups python regular expression repeating group matches

score 2 · Answer 1 · answered Nov 29 '21 at 16:25

2

You can use a group and a backreference to the number:

As I am not sure of your full condition to match, I am providing a minimal example here assuming the only match is a number with up to 3 digits followed by a reference of the form [document {number}]:

import re
mystr = "whatever whatever 1. [document 1] This is a document dealing with"
mystr = re.sub(r'((\d{1,3})\.)\s*\[document \2\]', r'\1', mystr)

output: 'whatever whatever 1. This is a document dealing with'

NB. In the example above the reference to consider is \2, you will have to update this carefully if you are using more capturing groups

answered Nov 29 '21 at 16:25

mozway

194,879
13
39
75

To match from a digit 1-999 `\b(([1-9][0-9]{0,2})\.)\s+\[[Dd]ocument \2]` – The fourth bird Nov 29 '21 at 16:34
1

@Thefourthbird thanks, but I was only providing a minimal example, the important point being IMO the backreference ;) (actually I hesitated just to put `\d+`) – mozway Nov 29 '21 at 16:39
Yes `([1-9]\d{0,2})` would also work of course – The fourth bird Nov 29 '21 at 16:40

python regex pattern that matches pattern including a repeated subgroup

1 Answers1