1

For example, I want to change the following string

strr = 'Hello, this is a test to remove whitespace.'

To

'Hello,this is a testto removewhitespace.'

So the whitespace directly after a comma, 't' or 'e' character should be removed. I tried something like:

re.sub(', |t |e ', ' ', strr)

However, this removes the comma, t and e as well. Afterwards, I am trying to split the string on the remaining whitespaces. My first approach was to split like this

re.split(' is |a |test|remove', strr)

However, this removes the delimiters as well, which is not what I want to achieve. So basically, I want to provide a list of characters followed by whitespace, such that the whitespace in that substring is removed.

Michael
  • 1,281
  • 1
  • 17
  • 32

2 Answers2

2

Something like:

import re

str1 = 'Hello, this is a test to remove whitespace.'

str2 = re.sub(r'([te,])\s+', r'\1', str1)

print(str2)

Should work, where you're matching (and capturing) a known group, followed by any amount of whitespace, and replacing that whole thing with just what you've captured.

jedwards
  • 29,432
  • 3
  • 65
  • 92
0

You can use positive lookbehind [regex-tutorial] for this:

re.sub('(?<[,te]) ', '', strr)

This positive lookbehind (?< ...) block will look for a match, but it will not be part of the match, so you do not "eat" the characters when you repace it.

Note that the second parameter, should be the empty string (so '', not ' '), since otherwise you "reintroduce" the space.

This then yields:

>>> re.sub('(?<=[,te]) ', '', strr)
'Hello,this is a testto removewhitespace.'

In case you want to remove an arbitrary number (so one or more) spacing characters (spaces, new lines, etc.), you can use the \s+ instead:

>>> re.sub('(?<=[,te])\s+', '', strr)
'Hello,this is a testto removewhitespace.'
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555