3

Yes I know there are a number of questions (e.g. (0) or (1)) which seem to ask the same, but AFAICS none really answers what I want.

What I want is, to replace any occurrence of a newline (LF) with the string \n, with no implicitly assumed newlines... and this with POSIX only utilities (and no GNU extensions or Bashisms) and input read from stdin with no buffering of that is desired.

So for example:

  • printf 'foo' | magic should give foo
  • printf 'foo\n' | magic should give foo\n
  • printf 'foo\n\n' | magic should give foo\n\n

The usually given answers, don't do this, e.g.:

  • awk
    printf 'foo' | awk 1 ORS='\\n gives foo\n, whereas it should give just foo
    so adds an \n when there was no newline.
  • sed
    would work for just foo but in all other cases, like:
    printf 'foo\n' | sed ':a;N;$!ba;s/\n/\\n/g' gives foo, whereas it should give foo\n
    misses one final newline.
    Since I do not want any sort of buffering, I cannot just look whether the input ended in an newline and then add the missing one manually.
    And anyway... it would use GNU extensions.
    sed -z 's/\n/\\n/g'
    does work (even retains the NULs correctly), but again, GNU extension.
  • tr
    can only replace with one character, whereas I need two.

The only working solution I'd have so far is with perl:
perl -p -e 's/\n/\\n/'
which works just as desired in all cases, but as I've said, I'd like to have a solution for environments where just the basic POSIX utilities are there (so no Perl or using any GNU extensions).

Thanks in advance.

calestyo
  • 327
  • 2
  • 7
  • what should be outputted for the `printf 'foo\\n'`? – αғsнιη Jan 12 '22 at 10:38
  • The big problem you have is that the output of `printf 'foo'` is not a valid POSIX text file (it's missing the required terminating newline) and so the behavior of any POSIX text processing tool is undefined given that input. What that means is that if you come up with a solution using some implementation (e.g. GNU) of sed, awk or any other tool even when running in POSIX-compliant mode that doesn't mean the same solution will work with any other version of that tool (eg. BSD) since any tool can do whatever it likes given input whose handling is undefined by POSIX and still be POSIX-compliant. – Ed Morton Jan 12 '22 at 13:21
  • To add on to @rowboat's references above which talk about a POSIX text file being zero or more **lines**, the [POSIX definition of a line](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206) is `A sequence of zero or more non- characters plus a terminating character.`. Note that a terminating character is required for a POSIX text line and therefore every line in a POSIX text file must end in a character and if your input to a text processing utility (e.g. sed or awk) is anything else then YMMV. – Ed Morton Jan 12 '22 at 14:35
  • @rowboat Well that's clear,... and it wasn't a requirement, that it works with NUL, I just said GNU's sed with -z does so nicely. – calestyo Jan 15 '22 at 04:28
  • @αғsнιη printf 'foo\\n' - print would here see the characters f o o \ \ n ... and it would interpret the double \ as the literal \. So the actually printed string is 'foo\n' ... all literal characters. And since there is no newline, the output should again be `foo\n`. – calestyo Jan 15 '22 at 04:34

3 Answers3

2

The following will work with all POSIX versions of the tools being used and with any POSIX text permissible characters as input whether a terminating newline is present or not:

$ magic() { { cat -u; printf '\n'; } | awk -v ORS= '{print sep $0; sep="\\n"}'; }

$ printf 'foo' | magic
foo$

$ printf 'foo\n' | magic
foo\n$

$ printf 'foo\n\n' | magic
foo\n\n$

The function first adds a newline to the incoming piped data to ensure that what awk is reading is a valid POSIX text file (which must end in a newline) so it's guaranteed to work in all POSIX compliant awks and then the awk command discards that terminating newline that we added and replaces all others with "\n" as required.

The only utility above that has to process input without a terminating newline is cat, but POSIX just talks about "files" as input to cat, not "text files" as in the awk and sed specs, and so every POSIX-compliant version of cat can handle input without a terminating newline.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

You can (I think) do this with pure POSIX shell. I am assuming you are working with text, not arbitrary binary data that can include null bytes.

magic () {
  while read x; do
      printf '%s\\n' "$x"
  done
  printf '%s' "$x"
}

read assumes POSIX text lines (terminated with a newline), but it still populates x with anything it reads until the end of its input when no linefeed is seen. So as long as read succeeds, you have a proper line (minus the linefeed) in x that you can write back, but with a literal \n instead of a linefeed.

Once the loop breaks, output whatever (if anything) in x after the failed read, but without a trailing literal \n.

$ [ "$(printf foo | magic)" = foo ] && echo passed
passed
$ [ "$(printf 'foo\n' | magic)" = 'foo\n' ] && echo passed
passed
$ [ "$(printf 'foo\n\n' | magic)" = 'foo\n\n' ] && echo passed
passed
chepner
  • 497,756
  • 71
  • 530
  • 681
  • That's also a quite nice solution. I didn't remember that read has an exit status >0 if it reaches EOF before it reaches \n. Regrettably it also uses >0 for any other errors and doesn't differentiate between the two. Still thanks for that solution. – calestyo Jan 15 '22 at 04:46
  • I just took a more thorough look and your solution would fail in some circumstances. read without -r treats \ specially, so printf '%s' '\t' | magic would right now just return t . – calestyo Jan 15 '22 at 04:57
  • Also, read(1) does field splitting and if fewer vars than fields, it more or less packs all into the last var (here the only one), but also discards any trailing IFS chars. Though this can be solved by using IFS='' read -r x – calestyo Jan 15 '22 at 04:59
1

Here is a tr + sed solution that should work on any POSIX shell as it doesn't call any gnu utility:

printf 'foo' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo

printf 'foo\n' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo\n

printf 'foo\n\n' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo\n\n

Details:

  • tr command replaces each line break with \x07
  • sed command replace each \x07 with \\n
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    That would fail if the input already contained `\7`s (or whatever other char you choose). – Ed Morton Jan 12 '22 at 13:17
  • Agreed though realistic chances of having `\x07` (or any other control character) in a text file are fairly low. – anubhava Jan 12 '22 at 13:47
  • 1
    Yeah, I know, but it's not necessary to have to make an exception. The other thing worth noting is that after the `tr` the output is no longer a valid POSIX text file since it's missing a terminating newline so YMMV with what any given sed will do with that. The odds of whatever sed you're using not behaving as you want are also fairly low. – Ed Morton Jan 12 '22 at 13:52
  • @EdMorton Unfortunately sed implementations seem to behave quite differently, and even POSIX doesn't really seem to properly define everything (see my https://www.austingroupbugs.net/view.php?id=1551 )... but I've never seen any implementation which doesn't work well with input that doesn't end in a newline. Do you know any such implementation? – calestyo Jan 15 '22 at 05:30
  • @calestyo AFAIK I never use sed to read input that doesn't have a terminating newline so I don't know if any sed I've ever used would work with that input or not, sorry. – Ed Morton Jan 15 '22 at 11:50