2

I'm trying to compress a text document by deleting of duplicated empty lines, with sed. This is what I'm doing (to no avail):

sed -i -E 's/\n{3,}/\n/g' file.txt

I understand that it's not correct, according to this manual, but I can't figure out how to do it correctly. Thanks.

yegor256
  • 102,010
  • 123
  • 446
  • 597
  • 1
    This cannot work, because `sed` only reads one line at a time. It's possible, but somewhat complex, to collect lines into memory, then suppress repeated empty lines; but this is trivial in `awk` or Perl. Is it really a requirement to use `sed`? `perl -0777pi -e 's/\n{3,}/\n/g' file.txt` – tripleee Sep 11 '12 at 15:33
  • `sed` is not mandatory, I can use `perl`. please, post your suggestion as an answer – yegor256 Sep 11 '12 at 16:30
  • check here:http://theunixshell.blogspot.in/2013/01/deleting-empty-lines-from-file.html – Vijay Jan 31 '13 at 17:43
  • look at this correct answer to duplicate issue : https://stackoverflow.com/a/16414489/6614155 . Like you, I tried to use substitute ('s/…) because tested under vim first, and it is best to use '/…/d' delete line command – bcag2 Aug 24 '23 at 07:51

6 Answers6

4

I think you want to replace spans of multiple blank lines with a single blank line, even though your example replaces multiple runs of \n with a single \n instead of \n\n. With that in mind, here are two solutions:

sed '/^$/{ :l
    N; s/^\n$//; t l
    p; d; }' input 

In many implementations of sed, that can be all on one line, with the embedded newlines replaced by ;.

awk 't || !/^$/; { t = !/^$/ }'
William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • Thanks a lot for the awk solution. Could you please explain the ` t || !/^$/; ` pattern? – Bernie Reiter Mar 29 '19 at 21:08
  • I have to admit, it does look a bit cryptic! Basically, it is evaluating the expression as a boolean. When `t` evaluates as true (eg, is a non-empty string or is not 0) or the line does not match the regex `^$` (ie, it is not a line with nothing on it), the expression evaluates as true. The `;` indicates that there is no commands, so awk applies the default and prints the line. – William Pursell Apr 03 '19 at 09:52
4

As tripleee suggested above, I'm using Perl instead of sed:

perl -0777pi -e 's/\n{3,}/\n\n/g'
yegor256
  • 102,010
  • 123
  • 446
  • 597
2

Use the translate function

 tr -s '\n'

the -s or --squeeze-repeats reduces a sequence of repeated character to a single instance.

Stuart
  • 1,008
  • 11
  • 14
1

This is much better handled by tr -s '\n' or cat -s, but if you insist on sed, here's an example from section 4.17 of the GNU sed manual:

#!/usr/bin/sed -f

# on empty lines, join with next
# Note there is a star in the regexp
:x
/^\n*$/ {
  N
  bx
}
# now, squeeze all '\n', this can be also done by:
# s/^\(\n\)*/\1/
s/\n*/\
/
Thor
  • 45,082
  • 11
  • 119
  • 130
0

I am not sure this is what the OP wanted but using the awk solution by William Pursell here is the approach if you want to delete ALL empty lines in the file:

awk '!/^$/' file.txt

Explanation:

The awk pattern

'!/^$/'

is testing whether the current line is consisting only of the beginning of a line (symbolised by '^') and the end of a line (symbolised by '$'), in other words, whether the line is empty.

If this pattern is true awk applies its default and prints the current line.

HTH

Bernie Reiter
  • 96
  • 2
  • 11
0

I think OP wants to compress empty lines, e.g. where there are 9 consecutive emty lines, he wants to have just three. I have written a little bash script that does just that:

#! /bin/bash
TOTALLINES="$(cat file.txt|wc -l)"
CURRENTLINE=1
while [ $CURRENTLINE -le $TOTALLINES ]
do
    L1=$CURRENTLINE
    L2=$(($L1 + 1))
    L3=$(($L1 +2))
    if [[ $(cat file.txt|head -$L1|tail +$L1) == "" ]]||[[ $(cat file.txt|head -$L1|tail +$L1) == " " ]]
    then
        L1EMPTY=true
    else 
        L1EMPTY=false
    fi
    if [[ $(cat file.txt|head -$L2|tail +$L2) == "" ]]||[[ $(cat file.txt|head -$L2|tail +$L2) == " " ]]
    then
        L2EMPTY=true
    else 
        L2EMPTY=false       
    fi
    if [[ $(cat file.txt|head -$L3|tail +$L3) == "" ]]||[[ $(cat file.txt|head -$L3|tail +$L3) == " " ]]
    then
        L3EMPTY=true
    else 
        L3EMPTY=false       
    fi  
    if [    $L1EMPTY = true ]&&[    $L2EMPTY = true ]&&[    $L3EMPTY = true ]
    then
        #do not cat line to temp file
        echo "Skipping line "$CURRENTLINE   
    else
        echo "$(cat file.txt|head -$CURRENTLINE|tail +$CURRENTLINE)">>temp.txt
        echo "Writing line " $CURRENTLINE
    fi
    ((CURRENTLINE++))
done    
cat temp.txt>file.txt
rm -r temp.txt
FINALTOTALLINES="$(cat file.txt|wc -l)"
EMPTYLINELINT=$(( $CURRENTLINE - $FINALTOTALLINES ))
echo "Deleted " $EMPTYLINELINT " empty lines."