How can I remove double line breaks with sed?

Question

I tried:

sed -i 's/\n+/\n/' file

but it's not working.

I still want single line breaks.

Input:

abc

def


ghi




jkl

Desired output:

abc

def

ghi

jkl

So your desired output still contains `\n\n` but not `\n\n\n`? — 5gon12eder, Dec 16 '14 at 17:53
Some versions of `cat` have `-s` or `--squeeze-blank` for replacing sequences of multiple blank lines with a single blank line... — twalberg, Dec 16 '14 at 18:30

score 5 · Answer 1 · answered Dec 16 '14 at 20:21

5

This might work for you (GNU sed):

sed '/^$/{:a;N;s/\n$//;ta}' file

This replaces multiple blank lines by a single blank line.

However if you want to place a blank line after each non-blank line then:

sed '/^$/d;G' file

Which deletes all blank lines and only appends a single blank line to a non-blank line.

answered Dec 16 '14 at 20:21

potong

55,640
6
51
83

This works perfect: sed '/^$/{:a;N;s/\n$//;ta}' file – Sam Roberts Jan 19 '15 at 20:40
Care to explain or link to a doc file? Are those braces serialized data? — Never mind, they are sed-specific commands. See sed manual. https://www.grymoire.com/Unix/Sed.html – WoodrowShigeru Aug 28 '21 at 09:17

Adam Katz · Answer 2 · 2014-12-16T17:48:37.077

4

Sed isn't very good at tasks that examine multiple lines programmatically. Here is the closest I could get:

$ sed '/^$/{n;/^$/d}' file
abc

def

ghi


jkl

The logic of this: if you find a blank line, look at the next line. If that next line is also blank, delete that next line.

This doesn't gobble up all of the lines in the end because it assumes that there was an intentional extra pair and reduced the two \n\ns down to two \ns.

To do it in basic awk:

$ awk 'NF > 0 {blank=0} NF == 0 {blank++} blank < 2' file
abc

def

ghi

jkl

This uses a variable called blank, which is zero when the number of fields (NF) is nonzero and increments when they are zero (a blank line). Awk's default action, printing, is performed when the number of consecutive blank lines is less than two.

edited Dec 16 '14 at 17:48

answered Dec 16 '14 at 17:43

Adam Katz

14,455
5
68
83

But there are more than 2 EOL after `ghi` – anubhava Dec 16 '14 at 17:48
@anubhava: Correct. That's the best `sed` can do, as I noted. That solves "double line breaks" but not the pseudocode regex given in the question. Awk is the more elegant solution for this. – Adam Katz Dec 16 '14 at 17:51

anubhava · Answer 3 · 2014-12-16T18:01:31.697

3

Using awk (gnu or BSD) you can do:

awk -v RS= -v ORS='\n\n' '1' file
abc

def

ghi

jkl

Also using perl:

perl -pe '$/=""; s/(\n)+/$1$1/' file
abc

def

ghi

jkl

edited Dec 16 '14 at 18:01

answered Dec 16 '14 at 17:39

anubhava

761,203
64
569
643

1

That perl solution will load the whole file into memory, which won't work for very large files. – Adam Katz Dec 16 '14 at 17:49

Zug · Answer 4 · 2022-01-16T10:58:08.370

Found here That's What I Sed (slower than this solution).

sed '/^$/N;/\n$/D' file

The sed script can be read as follows:

If the next line is empty, delete the current line.

And can be translated into the following pseudo-code (for the reader already familiar with sed, buffer refers to the pattern space):

 1 | # sed '/^$/N;/\n$/D' file
 2 | while not end of file :
 3 |   buffer = next line
 4 |   # /^$/N
 5 |   if buffer is empty :                        # /^$/
 6 |     buffer += "\n" + next line                # N
 7 |   end if
 8 |   # /\n$/D
 9 |   if buffer ends with "\n" :                  # /\n$/
10 |     delete first line in buffer and go to 5   # D
11 |   end if
12 |   print buffer
13 | end while

In the regular expression /^$/, the ^ and $ signs mean "beginning of the buffer" and "end of the buffer" respectively. They refer to the edges of the buffer, not to the content of the buffer.

The D command performs the following tasks: if the buffer contains newlines, delete the text of the buffer up to the first newline, and restart the program cycle (go back to line 1) without processing the rest of the commands, without printing the buffer, and without reading a new line of input.

Finally, keep in mind that sed removes the trailing newline before processing the line, and keep in mind that the print command adds back the trailing newline. So, in the above code, if the next line to be processed is Hello World!\n, then next line implicitely refers to Hello World!.

More details at https://www.gnu.org/software/sed/manual/sed.html.

You are now ready to apply the algorithm to the following file:

a\n
b\n
\n
\n
\n
c\n

Now let's see why this solution is faster.

The sed script /^$/{:a;N;s/\n$//;ta} can be read as follows:

If the current line matches /^$/, then do {:a;N;s/\n$//;ta}.

Since there is nothing between ^ and $ we can rephrase like this:

If the current line is empty, then do {:a;N;s/\n$//;ta}.

It means that sed executes the following commands for each empty line:

Step	Command	Description
1	`:a`	Declare a label named "a".
2	`N`	Append the next line preceded by a newline (`\n`) to the current line.
3	`s/\n$//`	Substitute (`s`) any trailing newline (`/\n$/`) with nothing (`//`).
4	`ta`	Return to label "a" (to step 1) if a substitution was performed (at step 3), otherwise print the result and move on to the next line.

Non empty lines are just printed as is. Knowing all this, we can describe the entire procedure with the following pseudo-code:

 1 | # sed '/^$/{:a;N;s/\n$//;ta}' file
 2 | while not end of file :
 3 |   buffer = next line
 4 |   # /^$/{:a;N;s/\n$//;ta}
 5 |   if buffer is empty :               # /^$/
 6 |     :a                               # :a
 7 |     buffer += "\n" + next line       # N
 8 |     if buffer ends with "\n" :       # /\n$/
 9 |       remove last "\n" from buffer   # s/\n$//
10 |       go to :a (at 6)                # ta
11 |     end if
12 |   end if
13 |   print buffer
14 | end while

As you can see, the two sed scripts are very similar. Indeed, s/\n$//;ta is almost the same as /\n$/D. However, the second script skips step 5, so it is potentialy faster than the first script. Let's time both scripts fed with ~10Mb of empty lines:

$ yes '' | head -10000000 > file
$ /usr/bin/time -f%U sed '/^$/N;/\n$/D' file > /dev/null
3.61
$ /usr/bin/time -f%U sed '/^$/{:a;N;s/\n$//;ta}' file > /dev/null
2.37

Second script wins.

score 1 · Answer 5 · answered Dec 16 '14 at 18:01

1

perl -00 -pe 1 filename

That splits the input file into "paragraphs" separated by 2 or more newlines, and then prints the paragraphs separated by a single blank line:

perl -00 -pe 1 <<END
abc

def


ghi




jkl
END

abc

def

ghi

jkl

answered Dec 16 '14 at 18:01

glenn jackman

238,783
38
220
352

score 0 · Answer 6 · answered Dec 16 '14 at 18:56

0

This gives you what you want using solely sed :

sed '/^$/d' txt | sed -e $'s/$/\\\n/'

The first sed command removes all empty lines, denoted as "^$".

The second sed command inserts one newline character at the end of each line.

answered Dec 16 '14 at 18:56

buydadip

8,890
22
79
154

The OP asked for replacement of double blank lines with single blank lines, not a single blank line between each non-blank line. – Nick Bailey Jul 29 '16 at 16:49
No, that may be what the text of the question implies, but the example he provides clearly indicates he wants all series of blank lines turned into single blank lines. That said, the asker's response to my solution indicates that there are some cases (not clear when) in which he doesn't want a blank line between non-blank lines. – TTT Dec 22 '16 at 18:38

score -1 · Answer 7 · answered Dec 16 '14 at 17:57

-1

Why not just get rid of all your blank lines, then add a single blank line after each line? For an input file tmp as you specified,

sed '/^$/d' tmp|sed '0~1 a\ '
abc

def

ghi

jkl

If white space (spaces and tabs) counts as a "blank" line for you, then use sed '/^\s*$/d' tmp|sed '0~1 a\ ' instead.

Note that these solutions do leave a trailing blank line at the end, as I wasn't sure if this was desired. Easily removed.

answered Dec 16 '14 at 17:57

TTT

1,175
2
14
32

There are some lines that should be together, like mno mno pqr – Sam Roberts Dec 16 '14 at 18:02
@SamRoberts So you're saying that certain lines shouldn't have blank lines between them? How does one know which ones should and shouldn't? – TTT Dec 16 '14 at 18:06

score -1 · Answer 8 · answered Dec 24 '21 at 11:41

-1

I wouldn't use sed for this but cat with the -s flag. As the manual states:

-s, --squeeze-blank    suppress repeated empty output lines

So all that is needed to get the desired output is:

cat -s file

answered Dec 24 '21 at 11:41

Potherca

13,207
5
76
94

How can I remove double line breaks with sed?

8 Answers8

Linked