I am looking for a pattern that catches and substitutes:
"whatever whatever 1. [document 1] This is a document dealing with"
"whatever whatever 1. This is a document dealing with"
but of course only in the case where both numbers are the same
in general:
"whatever whatever N. [document N] This is a document dealing with"
if it helps N has to be between 1 and 1000 (i.e. max three characters)
import re
mystr = "whatever whatever 1. [document 1] This is a document dealing with"
mystr = re.sub(r'([1-9]+)(\s)?(\.)(\s+)(\[Document )(*****)',r'\1\2\3\4',mystr)
^^^^^^^^ ^^^^^^
I have to refer in ***** to the first group
I could use:
mystr = re.sub(r'([1-9]+)(\s)?(\.)(\s+)(\[Document )([1-9]+)',r'\1\2\3\4',mystr)
but of course that will inlcude cases like: "whatever whatever 56. [document 877] This is a document dealing with"
I check a bunch of answers with no success: Regex: How to match a string that contains repeated pattern? Capture repeated groups in python regex Capturing repeating subpatterns in Python regex Regex with repeating groups python regular expression repeating group matches