0

I have below text string and I want to write a regular expression to match a string pattern as below:

[ 1.1 ] 1. A method of providing a master
[ 12.1 ] 12. An apparatus for providing
[ 39.3 ] b. one or more control point applications
[ 39.8 ] iv. a server application programming interface
[ 30.2 ] a. a client application programming

I want to substitute the ] 1. by ] and similarly for ] 12.,] b.,] iv., ] a.

Please include a case when below thing occur in regular expresison i.e. if no above pattern occurs

[ 1.2 ] an RFID device provided

I tried below regular expression but it not worked.

>>> st = "[ 12.1 ] 12. An apparatus for providing a master content directory within a network of devices comprising:"
>>> import re
>>> st = re.sub(r"(?:\]\s*\d+\.\s*)?","]",st)
>>> st
'][] ]1]2].]1] ]]A]n] ]a]p]p]a]r]a]t]u]s] ]f]o]r] ]p]r]o]v]i]d]i]n]g] ]a] ]m]a]s]t]e]r] ]c]o]n]t]e]n]t] ]d]i]r]e]c]t]o]r]y] ]w]i]t]h]i]n] ]a] ]n]e]t]w]o]r]k] ]o]f] ]d]e]v]i]c]e]s] ]c]o]m]p]r]i]s]i]n]g]:]'
Nick jones
  • 63
  • 1
  • 20
  • 2
    Maybe `re.sub(r"\]\s*\w+\.\s*","] ",st)` will do. See https://regex101.com/r/Lp9vBQ/1. The point is that your regex matches before each char in a string because it is optional, `(?:...)?` makes it match 1 or 0 times. – Wiktor Stribiżew Aug 30 '19 at 06:35
  • you can use `re.sub(r"\]\s\w+\.","]",st)`, if you are sure that the second numbering always ends with a `.` – Shijith Aug 30 '19 at 06:49
  • why did you write that `?` in the end in the regex string? – Paritosh Singh Aug 30 '19 at 06:55

2 Answers2

2
s = """
[ 1.1 ] 1. A method of providing a master
[ 12.1 ] 12. An apparatus for providing
[ 39.3 ] b. one or more control point applications
[ 39.8 ] iv. a server application programming interface
[ 30.2 ] a. a client application programming
"""
print(re.sub(r'\]\s\w{1,2}\.', '] ', s))

Output

[ 1.1 ]  A method of providing a master
[ 12.1 ]  An apparatus for providing
[ 39.3 ]  one or more control point applications
[ 39.8 ]  a server application programming interface
[ 30.2 ]  a client application programming
ComplicatedPhenomenon
  • 4,055
  • 2
  • 18
  • 45
2

The point is that your regex matches before each char in a string because it is optional, (?:...)?, a non-capturing group modified with a ? quantifier, makes it match 1 or 0 times.

Also, \d only matches a digit, and you need to consider letters, too.

To quickly fix the issue you may use

st = re.sub(r"\]\s*\w+\.\s*", "] ", st)

See this regex demo. The \w+ construct matches 1+ word chars (letters, digits or underscores).

You may make it a bit more precise if you match just 1+ digits or 1+ letters after ] before .:

st = re.sub(r"\]\s*(?:\d+|[a-zA-Z]+)\.\s*", "] ", st)
                   ^^^^^^^^^^^^^^^^^

See another regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563