0

I am looking for a pattern that catches and substitutes:

"whatever whatever 1. [document 1] This is a document dealing with"

"whatever whatever 1. This is a document dealing with"

but of course only in the case where both numbers are the same

in general:

"whatever whatever N. [document N] This is a document dealing with"

if it helps N has to be between 1 and 1000 (i.e. max three characters)

import re
mystr = "whatever whatever 1. [document 1] This is a document dealing with"
mystr = re.sub(r'([1-9]+)(\s)?(\.)(\s+)(\[Document )(*****)',r'\1\2\3\4',mystr)
                 ^^^^^^^^                            ^^^^^^

I have to refer in ***** to the first group

I could use:

mystr = re.sub(r'([1-9]+)(\s)?(\.)(\s+)(\[Document )([1-9]+)',r'\1\2\3\4',mystr)

but of course that will inlcude cases like: "whatever whatever 56. [document 877] This is a document dealing with"

I check a bunch of answers with no success: Regex: How to match a string that contains repeated pattern? Capture repeated groups in python regex Capturing repeating subpatterns in Python regex Regex with repeating groups python regular expression repeating group matches

JFerro
  • 3,203
  • 7
  • 35
  • 88

1 Answers1

2

You can use a group and a backreference to the number:

As I am not sure of your full condition to match, I am providing a minimal example here assuming the only match is a number with up to 3 digits followed by a reference of the form [document {number}]:

import re
mystr = "whatever whatever 1. [document 1] This is a document dealing with"
mystr = re.sub(r'((\d{1,3})\.)\s*\[document \2\]', r'\1', mystr)

output: 'whatever whatever 1. This is a document dealing with'

NB. In the example above the reference to consider is \2, you will have to update this carefully if you are using more capturing groups

mozway
  • 194,879
  • 13
  • 39
  • 75