6

Java 15 introduced (non-preview) text blocks feature. It allows to define multi-lined string literals without breaking code indentation, by stripping the common white space prefix from the lines. The algorithm is described in JEP 378.

But how exactly is "common white space prefix" defined in the case that lines are indented using mix of tabs and spaces?

For example, what would be the string value in the following case (· means a space, means a tab character):

→   →   ····String text = """
→   →   ····→   line1
→   ········→   line2
→   ····→   →   """;

A simple test with OpenJDK shows that the result string is:

line1
··→   line2

So it looks like Javac just counts white space symbols, including spaces and tabs, and uses the count — treating spaces (0x20) and tabs (0x09) equally. Is this the expected behavior?


Side note: this is not a purely theoretical question; it has practical importance for a project with mixed spaces/tabs indentation and large codebase.

Naman
  • 27,789
  • 26
  • 218
  • 353
Alex Shesterov
  • 26,085
  • 12
  • 82
  • 103
  • 4
    That’s why I have a strict “no tab” policy in my projects. But the answer to your question is even more horrible than I imagined. I’m excited to see the final version of the text block feature… – Holger Nov 03 '20 at 16:13
  • 4
    This issue consumed an inordinate degree of discussion in the design process; many complex epicycles were suggested to deal with inconsistent mixed whitespace. In the end, it seemed silly and counterproductive to add significant complexity to the language so we could cater to the .01% of users who think it's a good idea to routinely mix spaces and tabs inconsistently. The selected solution works equally well with spaces, tabs, and _consistent_ mixes of the two. – Brian Goetz Nov 03 '20 at 18:47
  • @BrianGoetz, thanks for sharing the background on this! – Alex Shesterov Nov 03 '20 at 21:25
  • 3
    @BrianGoetz generating an error for any inconsistent mix of the two would also work equally well with spaces, tabs, and consistent mixes of the two… – Holger Nov 05 '20 at 09:36

1 Answers1

8

I've found the answer which I'd like to share.

Java compiler indeed treats spaces, tabs and all other whitespace characters equally.

So the same amount of (any) whitespace characters is removed from every line.


Details:

javac tokenizer uses the String.stripIndent() method, which has the following implementation note:

This method treats all white space characters as having equal width. As long as the indentation on every line is consistently composed of the same character sequences, then the result will be as described above.

Alex Shesterov
  • 26,085
  • 12
  • 82
  • 103