2

I want multi-line strings in java, so I seek a simple preprocessor to convert C-style multi-lines into single lines with a literal '\n'.

Before:

    System.out.println("convert trailing backslashes\
this is on another line\
\
\
above are two blank lines\
But don't convert non-trailing backslashes, like: \"\t\" and \'\\\'");

After:

     System.out.println("convert trailing backslashes\nthis is on another line\n\n\nabove are two blank lines\nBut don't convert non-trailing backslashes, like: \"\t\" and \'\\\'");

I thought sed would do it well, but sed is line-based, so replacing the '\' and the newline that follows it (effectively joining the two lines) is not very natural in sed. I adapted sredden79's oneliner to the following - it works, it's clever, but it's not clear:

sed ':a { $!N; s/\\\n/\\n/; ta }'

The substitute is of escaped literal backslash, newline with escaped literal backslash, n. :a is a label and ta is goto label if the substitute found a match; $ means the last line, and $! is the opposite (i.e. all lines but the last). N means to append the next line to the pattern space (thus making the \n character visible.)

EDIT here's a variation to keep compiler error line numbers etc accurate: it turns each extended line into "..."+\n (and handles the first and last lines of the String correctly):

sed ':a { $!N; s/\\\n/\\n"+\n"/; ta }'

giving:

    System.out.println("convert trailing backslashes\n"+
"this is on another line\n"+
"\n"+
"\n"+
"above are two blank lines\n"+
"But don't convert non-trailing backslashes, like: \"\t\" and \'\\\'");

EDIT Actually, it would be better have Perl/Python style multi-line, where it starts and ends with a special code on one line (""" for python, I think).

Is there a simpler, saner, clearer way (maybe not using sed)?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
13ren
  • 11,887
  • 9
  • 47
  • 64
  • 3
    If you do this, you'll be killing your tools support. Suddenly no IDE will do syntax highlighting correctly, debuggers will show different line numbers than your (original) source file... Remember that the compiler (at compile-time) will join together literal strings concatenated with `+`, so just closing the string, writing a `+`, and opening it on the next line does away with the pre-processor and keeps your tool support intact. FWIW. – T.J. Crowder Feb 09 '10 at 06:10

4 Answers4

5

Is there a simpler, saner, clearer way.

Forget the pre-processor, live with the limitation, complain about it (so that it will maybe be fixed in Java 7 or 8), and use an IDE to ease the pain.

Other alternatives (too troublesome I suppose, but still better than messing with the compilation process):

  • use a JVM-based language that does support here-docs
  • externalize the string into a resource file
Community
  • 1
  • 1
Thilo
  • 257,207
  • 101
  • 511
  • 656
  • Thanks 1. IDE idea is cool, but doesn't help with editing (it's a pain to edit, add, move between lines of multi-line concatenated strings - that's what I used to use). 2. externalize into a resource file is what I do now - but I think it's simpler and more manageable to have it together with the source code to which it relates. 3. a whole new JVM language just to solve this does seem troublesome, but... it would already be debugged etc and have tool support, syntax highlighting etc, so your idea has an intriguing elegance! Of course, one can view the sed script as a "JVM language" itself. – 13ren Feb 09 '10 at 10:52
3

A perl one-liner:

perl -0777 -pe 's/\\\n/\\n/g'

This will read either stdin or the file(s) named after it on the command line and write the output to stdout.

If you're using an editor that supports filtering, like vi or emacs, just filter your text through the above command and you're done:

If you're using Windows and have to worry about \r :

C:\> perl -0777 -pe "s/\\\r?\n/\\n/g"

although I think win32 Perl handles \r itself so this may be unnecessary.

The -0777 option is a special case of the -0 (that's a zero) option that defines the line or record separator. In this case, it means that we don't want any separator so read the entire file in as a single string.

The -pe option is a combination of -p (process line-by-line and print the result) and -e (next argument is (a line of) the program to execute)

Adrian Pronk
  • 13,486
  • 7
  • 36
  • 60
1

A perl script to what you asked for.

while (<>) {
    chomp;
    print $_;
    if (/\\$/) {
        print "n";
    } else {
        print "\n";
    }
}
Lachlan Roche
  • 25,678
  • 5
  • 79
  • 77
0
sed 's/\x5c\x5c$/\x22\x5c\x5cn\x22/'

Hex for backslash and double quote is \x5c and \x22 respectively - it needs to be escaped so \x5c is doubled and the $ anchors to the end of the line.

Updated again per OP comment:

sed "{:a;N;\$!b a};s/\x5c\x5c\n/\x5c\x5cn/g" 

The :a creates a label and the N appends a line to the pattern space, the b a branches back to the label :a except when its the last line $!;

After its all loaded - a single line substitution replaces all occurrences of a newline \n with a literal '\n' using the hex ascii code \x5c for the backslash.

  • This doesn't join lines (and adds quotes, which aren't wanted). Try it on the example given in the question, and compare the result to see what I mean. – 13ren Dec 30 '14 at 16:06
  • I guess I answered too quick, I added an update to my answer. Sed operates on a newline terminated string so you can't modify spans across unless you glob them in the pattern space. –  Dec 30 '14 at 20:00
  • This looks very similar to the sed solution that is already in the question itself. Also, you can just escape the backslash, which is more readable than hex. But you seem to know your way around sed - maybe you can come up with a clearer version (that was the actual question). But I'm not sure there is one, simply because of the need to join lines (or glob them in the pattern space as you describe it). Maybe just do it in a two staged pipeline, the first one joining everything, the second doing the actual match? Anyway, read through the whole question and have at it! – 13ren Jan 01 '15 at 12:18
  • It is clear if you know sed in the sense that there is nothing superfluous about it - to make it readable to someone who doesn't know sed that well, and generally, you can comment around wherever you need to use constructs that may seem cryptic to those who are not very well versed in the lingo of the respective software tools. I use hex because I just finished writing a rename tool that uses sed alot to replace special characters in filenames, if you want a decent rename tool you can check it out here: http://scriptsandoneliners.blogspot.com/2014/12/bash-script-to-rename-files-wspecial.html –  Jan 01 '15 at 14:30
  • "Nothing superfluous", like "shortest", is not the same as "clearest". BUG: it doesn't match trailing backslash (i.e. `\` at the end of a line, or `\\\n`). Hexcodes hide such errors because less clear. Using `'` not `"` doesn't need dollar escape (`\$`), which is clearer. I love sed, but forget the syntax fast, so it's a poor investment (for me). At the moment, this answer is less clear than the one I gave in the question itself. But I like how you join the lines, and only then do the sub - might be less efficient(?), but clearer. – 13ren Jan 03 '15 at 14:55
  • Whoa, sorry, the example in the question doesn't demonstrate the bug I meant!: other lines should be left as-is (that don't have a trailing backslash). – 13ren Jan 03 '15 at 15:11
  • I think the one-liner in your OP is probably a bit slower over large data sets because it has to make calls to substitute on every pass. The amount of pattern space consumed is the same because they both are simply N (append to pattern space)'ing it to the end. Yet in the answer I wrote it will only do *one* single pass call to substitution vs one call per newline in the text. To get the shell to interpret hex you need to use double quotes yet it will also want to interpret the $ as a shell variable that's why its escaped. –  Jan 03 '15 at 20:54
  • 1
    This question asks for clearest, not fastest. This answer replaces newline with literal `\n` on all lines, not just those with trailing backslash. – 13ren Jan 05 '15 at 06:42
  • sed "{:a;N;\$!b a};s/\x5c\x5c\n/\x5cn/g" I see, more hex does the job - the example you gave doesn't highlight the difference since they all have trailing backslashes. Looks unclear, but a few comments above and its not only clear yet pedagogical! =) –  Jan 05 '15 at 17:11