-1

I'm using sed to modify the first part of a text file. The problem is that sed automatically introduces a empty line at the end of the file.

Do you know how to solve that? (Not using truncate, as I do not want to install additional software in MacOS)

Thanks!!

Cyrus
  • 84,225
  • 14
  • 89
  • 153
MM Manuel
  • 375
  • 2
  • 4
  • 16

1 Answers1

1

A quick answer to your question would be to pipe your output to another cmd like awk:

sed 'commands' file | awk '(NR>1){printf "%s\n",l}{l=$0}END{printf "%s",l}'

This will remove the last <newline>. This cannot be done by sed, the answer below tries to explain it. More possibilities can be found in How can I delete a newline if it is the last character in a file?

Why does sed always finishes with a <newline>? The answer to this question depends on the interpretation of the standard and the implementation of sed you use.

According to the sed posix standard:

In default operation, sed cyclically shall append a line of input, less its terminating <newline> character, into the pattern space. Reading from input shall be skipped if a <newline> was in the pattern space prior to a D command ending the previous cycle. The sed utility shall then apply in sequence all commands whose addresses select that pattern space, until a command starts the next cycle or quits. If no commands explicitly started a new cycle, then at the end of the script the pattern space shall be copied to standard output (except when -n is specified) and the pattern space shall be deleted. Whenever the pattern space is written to standard output or a named file, sed shall immediately follow it with a <newline>.

This means two things:

  • a line is not processed if it is not terminated by a <newline>.
  • anything written to standard output is terminated with a <newline>, i.e. output as a result of the end of the command cycle, or the issuing of commands p or P.

Example: sed (SunOS 5.10) SUNWcsu 11.10.0 rev=2005.01.21.15.53

$ echo -n foo | sed 'p'
$ echo -n 'foo\nbar' | sed 'p'                                                                                                                                                                                                
foo
foo

There is clearly no processing if the lines that are not terminated by a <newline>. Otherwise <newlines> are added at any output.

The MacOS sed manual has a similar interpration as posix.

Normally, sed cyclically copies a line of input, not including its terminating newline character, into a pattern space, (unless there is something left after a D function), applies all of the commands with addresses that select that pattern space, copies the pattern space to the standard output, append-ing appending ing a newline, and deletes the pattern space.

This is not tested as I do not have a mac.

The GNU sed manual seems to have a slightly different perspective on the matter:

sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.

When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed.

Which implies the following :

  • all lines are processed, all or not terminated by a <newline>
  • if the end of the command cycle is reached, the same amount of <newline>'s is added as was initially removed.

Example : sed (GNU sed) 4.2.2

In the following example, a newline is only added after p and not after the end of the cycle. (newline is 012 in hex)

    $ echo -n foo | hexdump -b
    0000000 146 157 157                                                    
    0000003
    $ echo -n foo | sed --posix 'p' | hexdump -b
    0000000 146 157 157 012 146 157 157                                    
    0000007

This is explained by Footnote 7 :

Actually, if sed prints a line without the terminating newline, it will nevertheless print the missing newline as soon as more text is sent to the same output stream, which gives the “least expected surprise” even though it does not make commands like sed -n p exactly identical to cat.

In conclusion: according to the posix standard, you will always end up with an output file finishing with a <newline>, it might however not be the last line of the input. According the the Gnu manual, your output terminates with the same amount of you have at the end of your input file.

Question: is GNU's sed --posix real posix?

kvantour
  • 25,269
  • 4
  • 47
  • 72