10

How do you split a string of words and retain whitespaces?

Here is the code:

String words[] = s.split(" "); 

String s contains: hello world

After the code runs, words[] contains: "hello" "" world

Ideally, it should not be an empty string in the middle, but contain both whitespaces: words[] should be: "hello" " " " " world

How do I get it to have this result?

Luiggi Mendoza
  • 85,076
  • 16
  • 154
  • 332
Rock Lee
  • 9,146
  • 10
  • 55
  • 88

4 Answers4

16

You could use lookahead/lookbehind assertions:

String[] words = "hello  world".split("((?<=\\s+)|(?=\\s+))");

where (?<=\\s+) and (?=\\s+) are zero-width groups.

Reimeus
  • 158,255
  • 15
  • 216
  • 276
10

If you can tolerate both white spaces together in one string, you can do

String[] words = s.split("\\b");

Then words contains ("hello", " ", "world").

dcsohl
  • 7,186
  • 1
  • 26
  • 44
  • 2
    +1 because in my case, this also would have been acceptable, because I was ultimately trying to reverse each word's characters, but leave the words in order and keep the same whitespace in between each word. – Rock Lee Jul 07 '15 at 20:50
  • OMG - this just made my entire day. Have been struggling to split words and retain spaces in between. So many workarounds that didn't work 100% - and then I see this - and everything falls into place! – slott Sep 18 '18 at 15:43
  • This is the best!! +1 – Eiston Dsouza May 17 '19 at 11:24
4

s.split("((?<= )|(?= ))"); is one way.

Technically, the regular expression is using lookahead and lookbehind. The single space after each = is the delimiter.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
1

You could do something like this:

List<String> result = new LinkedList<>();
int rangeStart = 0;
for (int i = 0; i < s.length(); ++i) {
  if (Character.isWhitespace(s.charAt(i))) {
    if (rangeStart < i) {
      result.add(s.substring(rangeStart, i));
    }
    result.add(Character.toString(s.charAt(i)));
    rangeStart = i + 1;
  }
}
if (rangeStart < s.length()) {
  result.add(s.substring(rangeStart));
}

Yeah, no regexes, sue me. This way you can see how it works more easily.

Sam Estep
  • 12,974
  • 2
  • 37
  • 75