1

I have this string:

The.Soundcraft.Si.Performer.1.is.digital.19.3.inch.mix.actually this is. a test

In this string I want to replace . characters that have a character directly before AND after it with . (so a trailing space) UNLESS the leading or trailing character is a number or space.

The end result would be:

The. Soundcraft. Si. Performer. 1.is. digital. 19.3. inch. mix. actually this is. a test

I tested my regex ([^0-9 ])\.([^0-9 ]) here: http://www.regexr.com/ and it seems to match all parts I need replaced.

So I coded this:

dim description as String = "The.Soundcraft.Si.Performer.1.is.digital.19.3.inch.mix.actually this is. a test"
description = Regex.Replace(description, "([^0-9 ])\.([^0-9 ])", ". ")

But nothing happens. What am I missing?

Adam
  • 6,041
  • 36
  • 120
  • 208
  • 1
    Try `description = Regex.Replace(description, "\b\.\b", ". ")` – Wiktor Stribiżew Feb 02 '16 at 21:57
  • 1
    Missing `@` in beginning of your regex. Should be `@"([^0-9 ])\.([^0-9 ])"`. –  Feb 02 '16 at 21:59
  • @noob: If it is VB.NET, `@` should not be used. – Wiktor Stribiżew Feb 02 '16 at 22:02
  • 1
    @WiktorStribiżew: There is `asp.net` tag too. –  Feb 02 '16 at 22:03
  • @WiktorStribiżew: Thanks! You're suggestion works...although I'm unsure why (I was reading this too http://stackoverflow.com/questions/6664151/difference-between-b-and-b-in-regex). Is `\b` just excluding numbers (which I want) or also (some) special characters (which I don't want)? – Adam Feb 02 '16 at 22:10
  • 1
    `\b` is a so-called word boundary. Read all about it here: http://www.regular-expressions.info/wordboundaries.html – Jeroen Feb 03 '16 at 03:35
  • @noob asp.net tag doesn't make answering a VB question with C# syntax useful. From the code given by the OP it' seems pretty obvious to me which language this is. Unless Microsoft decided to add Dim to C# as a keyword while I was asleep. – Jeroen Feb 03 '16 at 03:39

1 Answers1

1

You can use

description = Regex.Replace(description, "\b\.\b", ". ")

The regex demo here

enter image description here

Why does it work?

The word boundary \b can have 4 meanins depending on the context:

  • (?<!\w) in a construct like \b + word letter ([\p{L}\p{N}_])
  • (?<!\W) in a construct like \b + non-word letter ([^\p{L}\p{N}_])
  • (?!\w) in a construct like word letter ([\p{L}\p{N}_]) + \b
  • (?!\W) in a construct like non-word letter ([^\p{L}\p{N}_]) + \b.

In your case, the 2nd and 4th cases apply: the . is a non-word character, thus \b\.\b is the same as (?<!\W)\.(?!\W): match a dot that is enclosed with word characters.

EDGE CASE:

If you do not want to replace . that is next to _, you need to exclude the _ from the word boundary, and this is how it would look then:

(?<![^\p{L}\p{N}])\.(?![^\p{L}\p{N}])

See demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563