7

I surely hoped that this would be supported:

private static void regex() {
    String plain = "\\w+";
    String withTextBlocks = """
        \w+
    """;
}

but withTextBlocks does not compile under Java-17. Isn’t it the point of text blocks that we should not escape? I have been through the JEP and maybe the explanation is there, but I can't grok through it. And a second question in case someone knows, is there a future JEP for this? Thank you.

Brian Goetz
  • 90,105
  • 23
  • 150
  • 161
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 3
    Shouldn't it be `\\w+` in the `withTextBlocks` as well? – Zahid Khan Sep 07 '22 at 05:43
  • this is not a duplicate of _if_ text blocks exist, but if regexes are supported in text blocks. – Eugene Sep 07 '22 at 06:17
  • 1
    @ZahidKhan but the point of text blocks is that we should not escape, right? – Eugene Sep 07 '22 at 06:18
  • 2
    the point is about formatting ... `\W` is an escaped sequence even inside text blocks - an invalid one ([JLS 3.10.6. Text Blocks](https://docs.oracle.com/javase/specs/jls/se18/html/jls-3.html#jls-3.10.6): "... TextBlockCharacter: InputCharacter but not \ " or " EscapeSequence" or " LineTerminator ..." ) (from [JEP 355 Text Blocks](https://openjdk.org/jeps/355#Goals): "**Goals:** ... easy to express strings that span several lines ... avoiding escape sequences in common cases. ... can express the same set of strings as a string literal, and interpret the same escape sequences, ...") – user16320675 Sep 07 '22 at 06:39
  • 3
    @user16320675 these goals can be augmented with [**Non-Goals**](https://openjdk.org/jeps/355#Non-Goals): “*Text blocks do not support raw strings, that is, strings which intend no processing of their characters whatsoever.*” – Holger Sep 07 '22 at 06:55
  • So is your question how to write a backslash (or other character that usually needs to be escaped in a string literal) inside a text block? I think your question could benefit from further clarification. – Ole V.V. Sep 07 '22 at 07:45
  • 1
    JLS § 3.10.7: **It is a compile-time error if the character following a backslash in an escape sequence is not a LineTerminator or an ASCII b, s, t, n, f, r, ", ', \, 0, 1, 2, 3, 4, 5, 6, or 7**. – MC Emperor Sep 07 '22 at 07:48
  • @Holger thank you - the reason I asked is that this works in kotlin... if you want you can post this as an answer – Eugene Sep 07 '22 at 10:06
  • 1
    @OleV.V. sort of, I can't wrap around how to make it clearer. The thing is - this works in kotlin, and we switched to jdk-17 and some of our regexes could really benefit from such a feature (and besides all the kotlin devs are giving me the "time to switch to kotlin?" :) ) feel free to edit if you want. thank you – Eugene Sep 07 '22 at 10:26
  • Without having read the docs: escaping the backslash in the text block too as @ZahidKhan suggested (`\\w+`) works for me and produces a string that prints like `\w+` and has a newline at the end. – Ole V.V. Sep 07 '22 at 11:37
  • @OleV.V. right! for a simple regex like this - sure. but when the regex gets a lot more complicated, I wish it would work – Eugene Sep 07 '22 at 11:47
  • 1
    I find it easy to understand that wish. If you want your regex to match a backslash, you need to type `\\\\\`, and it soon gets unreadable. Doing away with the need to escape would certainly help. – Ole V.V. Sep 07 '22 at 11:54

1 Answers1

12

You are conflating text blocks with raw strings. These are different features, though they were explored together and this may explain why you mentally folded them together. There is no support yet for raw strings (which turn out to be somewhat more slippery than they might first appear.)

Isn’t it the point of text blocks that we should not escape?

No, that is not the point of text blocks. The point of text blocks is to allow us to represent two dimensional blocks of text in code, preserving the block's relative indentation but not absolute indentation. This allows us to freely indent the source representation of the text block itself to match surrounding code, without affecting the indentation of the string the text block describes.

An additional design goal is that text blocks should differ from ordinary string literals only in ways that pertain to their two-dimensional nature. There should not be a different set of escape characters, or different escaping rules. (If we ever do raw strings, it should apply equally to text blocks and traditional string literals.) If text blocks worked the way you wanted, you'd probably be complaining that you can't do the same with single-line strings. These aspects are orthogonal and the language should treat them orthogonally.

Brian Goetz
  • 90,105
  • 23
  • 150
  • 161