1

I am reading in a file and trying to replace every occurrence of a regex match with that match but with the white space stripped. For example, the regex which matches correctly on what I want in my document is '([0-9]+\s(st|nd|rd|th))' so that anything inside of the document of the form...

1 st, 2 nd, 33 rd, 134 th etc. will be matched.

What I want is to simply write a new file with each of those occurrences in the original file replaced with the white space removed.

I have played with a few things like re.findall and re.sub but I cant quite figure out how to write the full document but with just the substring matches replaced without white space.

Thanks for the help.

user2860682
  • 89
  • 2
  • 7

3 Answers3

2

replaced with the white space removed.

Try using Non-capturing group.

(?:\d+)\s+(?:(st|nd|rd|th))

Online demo

The above regex will capture for spaces between digits followed by any one of st,nd,rd,th. Now simply replace all the spaces with an empty string.

Community
  • 1
  • 1
Braj
  • 46,415
  • 5
  • 60
  • 76
2

If I understand correctly, you could use re.sub to achieve this.

Instead of placing a capturing group around your entire pattern, place one around the numbers and another around the selected text, omitting whitespace.

>>> import re
>>> text = 'foo bar 1 st, 2 nd, 33 rd, 134 th baz quz'
>>> re.sub(r'([0-9]+)\s+(st|nd|rd|th)\b', '\\1\\2', text)

Another way would be to use lookarounds.

>>> re.sub(r'(?<=[0-9])\s+(?=(?:st|nd|rd|th)\b)', '', text)

Output

foo bar 1st, 2nd, 33rd, 134th baz quz
hwnd
  • 69,796
  • 4
  • 95
  • 132
1

Another trick without capturing groups. You need to add the word boundary in your regex to match only the spaces between the digits and the st or nd or ed or th strings. In the replacement part, matched spaces are replaced with a null string(ie, matched spaces are removed through re.sub)

>>> import re
>>> text = 'foo 1 st, 2 nd, 33 rddfa,33 rd,bar 134 th'
>>> re.sub(r'(?<=\d)\s+(?=(?:st|nd|rd|th)\b)', r'', text)
'foo 1st, 2nd, 33 rddfa,33rd,bar 134th'

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274