I tried:
sed -i 's/\n+/\n/' file
but it's not working.
I still want single line breaks.
Input:
abc
def
ghi
jkl
Desired output:
abc
def
ghi
jkl
I tried:
sed -i 's/\n+/\n/' file
but it's not working.
I still want single line breaks.
Input:
abc
def
ghi
jkl
Desired output:
abc
def
ghi
jkl
This might work for you (GNU sed):
sed '/^$/{:a;N;s/\n$//;ta}' file
This replaces multiple blank lines by a single blank line.
However if you want to place a blank line after each non-blank line then:
sed '/^$/d;G' file
Which deletes all blank lines and only appends a single blank line to a non-blank line.
Sed isn't very good at tasks that examine multiple lines programmatically. Here is the closest I could get:
$ sed '/^$/{n;/^$/d}' file
abc
def
ghi
jkl
The logic of this: if you find a blank line, look at the next line. If that next line is also blank, delete that next line.
This doesn't gobble up all of the lines in the end because it assumes that there was an intentional extra pair and reduced the two \n\n
s down to two \n
s.
To do it in basic awk
:
$ awk 'NF > 0 {blank=0} NF == 0 {blank++} blank < 2' file
abc
def
ghi
jkl
This uses a variable called blank, which is zero when the number of fields (NF
) is nonzero and increments when they are zero (a blank line). Awk's default action, printing, is performed when the number of consecutive blank lines is less than two.
Using awk (gnu or BSD) you can do:
awk -v RS= -v ORS='\n\n' '1' file
abc
def
ghi
jkl
Also using perl
:
perl -pe '$/=""; s/(\n)+/$1$1/' file
abc
def
ghi
jkl
Found here That's What I Sed (slower than this solution).
sed '/^$/N;/\n$/D' file
The sed
script can be read as follows:
If the next line is empty, delete the current line.
And can be translated into the following pseudo-code (for the reader already familiar with sed
, buffer
refers to the pattern space):
1 | # sed '/^$/N;/\n$/D' file
2 | while not end of file :
3 | buffer = next line
4 | # /^$/N
5 | if buffer is empty : # /^$/
6 | buffer += "\n" + next line # N
7 | end if
8 | # /\n$/D
9 | if buffer ends with "\n" : # /\n$/
10 | delete first line in buffer and go to 5 # D
11 | end if
12 | print buffer
13 | end while
In the regular expression /^$/
, the ^
and $
signs mean "beginning of the buffer" and "end of the buffer" respectively. They refer to the edges of the buffer, not to the content of the buffer.
The D
command performs the following tasks: if the buffer contains newlines, delete the text of the buffer up to the first newline, and restart the program cycle (go back to line 1) without processing the rest of the commands, without printing the buffer, and without reading a new line of input.
Finally, keep in mind that sed
removes the trailing newline before processing the line, and keep in mind that the print
command adds back the trailing newline. So, in the above code, if the next line to be processed is Hello World!\n
, then next line
implicitely refers to Hello World!
.
More details at https://www.gnu.org/software/sed/manual/sed.html.
You are now ready to apply the algorithm to the following file:
a\n
b\n
\n
\n
\n
c\n
Now let's see why this solution is faster.
The sed
script /^$/{:a;N;s/\n$//;ta}
can be read as follows:
If the current line matches
/^$/
, then do{:a;N;s/\n$//;ta}
.
Since there is nothing between ^
and $
we can rephrase like this:
If the current line is empty, then do
{:a;N;s/\n$//;ta}
.
It means that sed
executes the following commands for each empty line:
Step | Command | Description |
---|---|---|
1 | :a |
Declare a label named "a". |
2 | N |
Append the next line preceded by a newline (\n ) to the current line. |
3 | s/\n$// |
Substitute (s ) any trailing newline (/\n$/ ) with nothing (// ). |
4 | ta |
Return to label "a" (to step 1) if a substitution was performed (at step 3), otherwise print the result and move on to the next line. |
Non empty lines are just printed as is. Knowing all this, we can describe the entire procedure with the following pseudo-code:
1 | # sed '/^$/{:a;N;s/\n$//;ta}' file
2 | while not end of file :
3 | buffer = next line
4 | # /^$/{:a;N;s/\n$//;ta}
5 | if buffer is empty : # /^$/
6 | :a # :a
7 | buffer += "\n" + next line # N
8 | if buffer ends with "\n" : # /\n$/
9 | remove last "\n" from buffer # s/\n$//
10 | go to :a (at 6) # ta
11 | end if
12 | end if
13 | print buffer
14 | end while
As you can see, the two sed
scripts are very similar. Indeed, s/\n$//;ta
is almost the same as /\n$/D
. However, the second script skips step 5, so it is potentialy faster than the first script. Let's time both scripts fed with ~10Mb of empty lines:
$ yes '' | head -10000000 > file
$ /usr/bin/time -f%U sed '/^$/N;/\n$/D' file > /dev/null
3.61
$ /usr/bin/time -f%U sed '/^$/{:a;N;s/\n$//;ta}' file > /dev/null
2.37
Second script wins.
perl -00 -pe 1 filename
That splits the input file into "paragraphs" separated by 2 or more newlines, and then prints the paragraphs separated by a single blank line:
perl -00 -pe 1 <<END
abc
def
ghi
jkl
END
abc
def
ghi
jkl
This gives you what you want using solely sed :
sed '/^$/d' txt | sed -e $'s/$/\\\n/'
The first sed command removes all empty lines, denoted as "^$".
The second sed command inserts one newline character at the end of each line.
Why not just get rid of all your blank lines, then add a single blank line after each line? For an input file tmp
as you specified,
sed '/^$/d' tmp|sed '0~1 a\ '
abc
def
ghi
jkl
If white space (spaces and tabs) counts as a "blank" line for you, then use sed '/^\s*$/d' tmp|sed '0~1 a\ '
instead.
Note that these solutions do leave a trailing blank line at the end, as I wasn't sure if this was desired. Easily removed.
I wouldn't use sed
for this but cat
with the -s
flag.
As the manual states:
-s, --squeeze-blank suppress repeated empty output lines
So all that is needed to get the desired output is:
cat -s file