Whitespace:
is any character or series of characters that represent horizontal or vertical space in typography. When
Issue
In you example the \s
matched and replaced all of the following:
- regular space like
(horizontal)
- tab like
\t
(horizontal)
- carriage-return like
\r
(vertical)
- new-line or line-feed like
\n
(vertical)
See this substitution demo for Java's regex-flavor.
Alternative Solutions
In Java you could easily condense this horizontal whitespace with:
(1) Split by lines and clean each line separately
See the demo on IdeOne:
String multiLineText = "\tHello World!" + "\n"
+ "New line";
String lineSeparatorRegex = "\r?\n"; // usually "\n" on Linux/Mac or "\r\n" on Windows
List<String> condensedLines = new ArrayList();
String[] lines = multiLineText.split(System.lineSeparator()); // alternative: use the regex
for (String line : lines) {
condensedLines.add(line.replaceAll("\\s+", " ")); // condense
}
String condensedPerLine = String.join(System.lineSeparator(), condensedLines);
Note: System.getProperty("line.separator")
is the old way before System.lineSeparator()
was introduced in Java 1.7
(2) Simple multi-line capable regex
as answered by Niko:
// remove all tabs or additional space characters
String condensedPerLine = multiLineText.replaceAll("[\t ]+", " ");
See on Regex101: demo preserving lines.
(3) Use Apache StringUtils with streaming:
StringUtils
class is perfect for handling Strings null-safe, for this case normalizeWhitespace(s)
.
Note there in JavaDocs also the hint:
Java's regexp pattern \s defines whitespace as [ \t\n\x0B\f\r]
// clean all superfluous whitespace and control-characters from lines
String condensedPerLine = Arrays.stream(multiLineText.split(System.lineSeparator())
.map( s -> return StringUtils.normalizeWhitespace(s))
.collect(Collectors.joining(System.lineSepartor()));
See also