3

I need to add spaces between all punctuation in a string.

\\ "Hello: World." -> "Hello : World ."
\\ "It's 9:00?"    -> "It ' s 9 : 00 ?"
\\ "1.B,3.D!"      -> "1 . B , 3 . D !"

I think a regex is the way to go, matching all non-punctuation [a-ZA-Z\\d]+, adding a space before and/or after, then extracting the remainder matching all punctuation [^a-ZA-Z\\d]+.

But I don't know how to (recursively?) call this regex. Looking at the first example, the regex will only match the "Hello". I was thinking of just building a new string by continuously removing and appending the first instance of the matched regex, while the original string is not empty.

private String addSpacesBeforePunctuation(String s) {
    StringBuilder builder = new StringBuilder();
    final String nonpunctuation = "[a-zA-Z\\d]+";
    final String punctuation = "[^a-zA-Z\\d]+";

    String found;
    while (!s.isEmpty()) {

        // regex stuff goes here

        found = ???; // found group from respective regex goes here
        builder.append(found);
        builder.append(" ");
        s = s.replaceFirst(found, "");
    }

    return builder.toString().trim();
}

However this doesn't feel like the right way to go... I think I'm over complicating things...

budi
  • 6,351
  • 10
  • 55
  • 80

2 Answers2

5

You can use lookarounds based regex using punctuation property \p{Punct} in Java:

str = str.replaceAll("(?<=\\S)(?:(?<=\\p{Punct})|(?=\\p{Punct}))(?=\\S)", " ");
  • (?<=\\S) Asserts if prev char is not a white-space
  • (?<=\\p{Punct}) asserts a position if previous char is a punctuation char
  • (?=\\p{Punct}) asserts a position if next char is a punctuation char
  • (?=\\S) Asserts if next char is not a white-space

IdeOne Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Now it adds a space at the end of the string if there is a punctuation there. – RealSkeptic Oct 05 '15 at 17:32
  • @RealSkeptic: Very good catch. I have fixed it, check my updated regex and demo now. – anubhava Oct 05 '15 at 17:38
  • 1
    @anubhava Lousy connection. Have to quit. See RealSkeptic's comment! – laune Oct 05 '15 at 17:39
  • @laune: Yes I've updated regex based on that comment, please check it now. – anubhava Oct 05 '15 at 17:39
  • @anubhava Now... if you adopt the negation used in dasblinkelight's regex you simplify it a little, and do it in a single regex pass, which is better than that solution. `"(?<=\\S)(?:(?<=\\p{Punct})|(?=\\p{Punct}))(?=\\S)"` – laune Oct 05 '15 at 17:44
  • 1
    @anubhava And I think a little less headache-causing when trying to understand it is this: `"(?<=\\S)(?=\\p{Punct})|(?<=\\p{Punct})(?=\\S)"` – laune Oct 05 '15 at 17:50
  • @laune: Yes that's right `(?<=\\S)(?:(?<=\\p{Punct})|(?=\\p{Punct}))(?=\\S)` will work fine (edited). – anubhava Oct 05 '15 at 17:52
2

When you see a punctuation mark, you have four possibilities:

  1. Punctuation is surrounded by spaces
  2. Punctuation is preceded by a space
  3. Punctuation is followed by a space
  4. Punctuation is neither preceded nor followed by a space.

Here is code that does the replacement properly:

String ss = s
    .replaceAll("(?<=\\S)\\p{Punct}", " $0")
    .replaceAll("\\p{Punct}(?=\\S)", "$0 ");

It uses two expressions - one matching the number 2, and one matching the number 3. Since the expressions are applied on top of each other, they take care of the number 4 as well. The number 1 requires no change.

Demo.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523