3

So I want to substitude dots in string when there is no space after of before the dots. I have thought this could be easily done with a regular expression but I haven't been able to do it.

I have patterns and I want them to be:

  • h.e.ll.o w.o.r.l.d: hello world
  • h.e.ll.o w.o.r.l.d: hello world
  • hello. world: hello. world

I have tried the following patterns:

\w+(\.)+\w+
\w+(\.+\w+)
\w+\.+\w+

I always get something like: he.ll.o wo.rl.d

I am using python's re module to match and replace with the following code:

>>> re.sub(r'\w+\.+\w+', lambda x: x.group(0).replace('.', ''), 'h.e.ll.o w.o.r.l.d')
'he.llo wo.rl.d'
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • similar search in Java: https://stackoverflow.com/questions/8584981/javascript-regex-find-a-word-not-followed-by-space-character – Evgeny Jun 19 '18 at 08:48

2 Answers2

12

In all your patterns you consume a char after the dot, so there is no chance to match it in the next iteration with the first \w+ (as it must consume at least 1 word char).

To fix your approach, you may match 1+ word chars followed with 1+ repetitions of . followed with 1+ word chars:

re.sub(r'\w+(?:\.+\w+)*', lambda x: x.group(0).replace('.', ''), s)

Here is the Python demo.

Another approach to remove . between word chars is

re.sub(r'\b\.\b', '', s)

See this regex demo. Here, . is only matched in case it is within word chars.

Alternatively, you may use this approach to match any . not enclosed with whitespace:

re.sub(r'(?<!\s)\.(?!\s)', '', 'h.e.ll.o w.o.r.l.d')

See the Python demo and the regex demo.

Details

  • (?<!\s) - a negative lookbehind that fails the match if there is a whitespace immediately to the left of the current location
  • \. - a dot
  • (?!\s) - a negative lookahead that fails the match if there is a whitespace immediately to the right of the current location.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

This would be my approach.

re.sub(r'\.(?=\w)', '', 'h.e.ll.o. w.o.r.l.d')

  • \. a dot
  • (?=\w) Look ahead: Checks if there is \w after the dot.
eddey
  • 31
  • 3