47

I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start; and if there are no non-empty lines after them, at the end.)

Is this possible outside of a fully-featured scripting language like Perl or Ruby? I’d prefer to do this with sed or awk if possible. Basically, any light-weight and widely available UNIX-y tool would be fine, especially one I can learn more about quickly (Perl, thus, not included.)

ELLIOTTCABLE
  • 17,185
  • 12
  • 62
  • 78

19 Answers19

70

From Useful one-line scripts for sed:

# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file

# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file

Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:

sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file
dogbane
  • 266,786
  • 75
  • 396
  • 414
16

So I'm going to borrow part of @dogbane's answer for this, since that sed line for removing the leading blank lines is so short...

tac is part of coreutils, and reverses a file. So do it twice:

tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'

It's certainly not the most efficient, but unless you need efficiency, I find it more readable than everything else so far.

Community
  • 1
  • 1
Izkata
  • 8,961
  • 2
  • 40
  • 50
  • 1
    There's an edge case worth mentioning: if the file doesn't have a trailing `\n`, the last line won't be handled correctly: try `tac <(printf 'a\nb')`. Arguably, this behavior is flawed; also affects `tac`'s OSX equivalent, `tail -r`. – mklement0 Nov 07 '14 at 18:54
  • `paste` can solve this edge case. I've added an answer below showing how. – freeB Jul 02 '23 at 15:49
7

As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get

echo "$(echo "$(tac "$filename")" | tac)"

which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.

Community
  • 1
  • 1
Jason Gross
  • 5,928
  • 1
  • 26
  • 53
  • 2
    +1 for (relative) simplicity (albeit at the expense of efficiency); OSX version (where `tac` is not available by default): `echo "$(echo "$(tail -r "$filename")" | tail -r)"` I ran tests to compare relative execution speed with a 1-million-lines file for several answers (didn't pay attention to memory use); earlier means faster: OSX 10.10: sed (dogbane) < bash (mklement0) < awk (glenn jackman) < tac (tail -r; you) Ubuntu 14.04: sed (dogbane) < tac (you) < bash (mklement0) < awk (glenn jackman) One interesting difference is that `tac` is much faster on Ubuntu than on OSX. – mklement0 Nov 07 '14 at 18:38
  • 2
    There's an edge case worth mentioning: if the file doesn't have a trailing `\n`, the last line won't be handled correctly: try `echo "$(echo "$(printf 'a\nb' | tac)" | tac)"`. This is inherent in the - arguably flawed - behavior of `tac` (and also `tail -r` on OSX) with input not ending in `\n`. – mklement0 Nov 07 '14 at 18:50
  • 1
    Using `echo "$(echo "$(cat "$filename")" | tac)" | tac` fixes the edge case that @mklement0 mentioned. – rivy Aug 20 '17 at 03:13
  • `paste` can also solve this edge case. I've added an answer below showing how. – freeB Jul 02 '23 at 15:54
7

here's a one-pass solution in awk: it does not start printing until it sees a non-empty line and when it sees an empty line, it remembers it until the next non-empty line

awk '
    /[[:graph:]]/ {
        # a non-empty line
        # set the flag to begin printing lines
        p=1      
        # print the accumulated "interior" empty lines 
        for (i=1; i<=n; i++) print ""
        n=0
        # then print this line
        print
    }
    p && /^[[:space:]]*$/ {
        # a potentially "interior" empty line. remember it.
        n++
    }
' filename

Note, due to the mechanism I'm using to consider empty/non-empty lines (with [[:graph:]] and /^[[:space:]]*$/), interior lines with only whitespace will be truncated to become truly empty.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • +1 for a single-pass, single-utility solution that is also memory-efficient (though, as noted, its behavior differs slightly from what was asked for). – mklement0 Jul 07 '14 at 14:07
5

Here's an adapted sed version, which also considers "empty" those lines with just spaces and tabs on it.

sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'

It's basically the accepted answer version (considering BryanH comment), but the dot . in the first command was changed to [^[:blank:]] (anything not blank) and the \n inside the second command address was changed to [[:space:]] to allow newlines, spaces an tabs.

An alternative version, without using the POSIX classes, but your sed must support inserting \t and \n inside […]. GNU sed does, BSD sed doesn't.

sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'

Testing:

prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' 



foo

foo



prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -n l
$
 \t $
$
foo$
$
foo$
$
 \t $
$
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
foo

foo
prompt$
Aurelio Jargas
  • 370
  • 3
  • 5
5

this can be solved easily with sed -z option

sed -rz 's/^\n+//; s/\n+$/\n/g' file
Hello

Welcome to
Unix and Linux
mug896
  • 1,777
  • 1
  • 19
  • 17
3

using awk:

awk '{a[NR]=$0;if($0 && !s)s=NR;}
    END{e=NR;
        for(i=NR;i>1;i--) 
            if(a[i]){ e=i; break; } 
        for(i=s;i<=e;i++)
            print a[i];}' yourFile
Kent
  • 189,393
  • 32
  • 233
  • 301
  • I wonder if there’s a way to reduce/refactor that to handle it in one pass? (I’m not massively familiar with awk; I can read what you wrote, but I’m not sure how to refactor it.) – ELLIOTTCABLE Sep 09 '11 at 09:49
  • basically this is an one-line command, the only dynamic part is 'yourFile', which is the filename you want to process. why you need reduce/refactor? – Kent Sep 09 '11 at 09:58
  • 1
    Because it’s long and complex, even if it doesn’t need any newlines? Several for loops, multiple statements; unnecessary complexity. (= – ELLIOTTCABLE Sep 09 '11 at 10:07
2

For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.

sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'

It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:

sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'

I've tried a simple performance comparison with the well-known recursive script

sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'

on a 3MB file with 1MB of random blank lines around a random base64 text.

shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile

The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)

For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.

sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'
tlwhitec
  • 1,845
  • 16
  • 15
1

@dogbane has a nice simple answer for removing leading empty lines. Here's a simple awk command which removes just the trailing lines. Use this with @dogbane's sed command to remove both leading and trailing blanks.

awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'

This is pretty simple in operation.

  • Add every line to a buffer as we read it.
  • For every line which contains a character, print the contents of the buffer and then clear it.

So the only things that get buffered and never displayed are any trailing blanks.

I used printf instead of print to avoid the automatic addition of a newline, since I'm using newlines to separate the lines in the buffer already.

Andy Mortimer
  • 3,619
  • 20
  • 14
1

This AWK script will do the trick:

BEGIN {
    ne=0;
}

/^[[:space:]]*$/ {
    ne++;
}

/[^[:space:]]+/ {
    for(i=0; i < ne; i++)
        print "";
    ne=0;
    print
}

The idea is simple: empty lines do not get echoed immediately. Instead, we wait till we get a non-empty line, and only then we first echo out as much empty lines as seen before it, and only then echo out the new non-empty line.

Adi Degani
  • 187
  • 2
  • 4
  • This successfully removes trailing blank lines (including lines containing nothing but white space). However, it does not preserve white space in intermediate blank lines; these are truncated to empty lines. Example: `$'a\n \nb'` is transformed into `$'a\n\nb'`. – Robin A. Meade Apr 30 '20 at 05:37
1
perl -0pe 's/^\n+|\n+(\n)$/\1/gs'
Jan Kyu Peblik
  • 1,435
  • 14
  • 20
1

Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).

It is memory efficient; it does not read the entire file into memory.

awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'

The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.

If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)

If you want to remove only those trailing lines that are empty, see @AndyMortimer's answer.

Robin A. Meade
  • 1,946
  • 18
  • 17
1

In bash, using cat, wc, grep, sed, tail and head:

# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
m=$(($k-$j))
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i

Man, it's definitely worth to learn "real" programming language to avoid that ugliness!

Alexander Poluektov
  • 7,844
  • 1
  • 28
  • 32
  • Well *that* part is easy enough with sed! Let me play with it, and try to get back here with a completed command. Thanks! – ELLIOTTCABLE Sep 09 '11 at 09:43
  • Actually, that won’t work for the last lines, because it removes *all* newlines in the grep stage, thus throwing off the count at the end. /= – ELLIOTTCABLE Sep 09 '11 at 09:45
  • Nope: after executing these commands you still have your original file. Second command prints all non-blanks preppenging with their line numbers. Thus you'll have number of last non-blank. – Alexander Poluektov Sep 09 '11 at 09:49
  • Ah! I misunderstood the operation of `grep -n` it seems. Yes! – ELLIOTTCABLE Sep 09 '11 at 09:55
  • (Accepted, though I used a one-line variant without any shell-variables, instead expressing a bit more with the `sed` commands.) – ELLIOTTCABLE Sep 09 '11 at 10:05
  • (Also, for what it’s worth; I know *many* ‘real language,’ not to mention having written a few thereof. They just weren’t appropriate for this task-space ;D) – ELLIOTTCABLE Sep 09 '11 at 10:06
  • That's heavy-handed: 11 invocations of external utilities, and a bunch of subshells. – mklement0 Jul 07 '14 at 14:01
1

Using bash

$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"
bash-o-logist
  • 6,665
  • 1
  • 17
  • 14
  • This only removes a single blank line from the start, and none from the end. – me_and Oct 03 '13 at 09:45
  • 3
    @me_and: While you're correct about only removing _one_ empty line from the start, this actually _does_ remove all trailing newlines, because command substitution (`$( – mklement0 Jul 07 '14 at 13:00
  • @mklement0: Huh, so it does. Learn a new thing every day! – me_and Jul 10 '14 at 11:18
0

A bash solution.

Note: Only useful if the file is small enough to be read into memory at once.

[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"
  • $(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
  • =~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
  • Note that this particular regex always matches, so the command after && is always executed.
  • Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
  • Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').
mklement0
  • 382,024
  • 64
  • 607
  • 775
0

I'd like to introduce another variant for gawk v4.1+

result=($(gawk '
    BEGIN {
        lines_count         = 0;
        empty_lines_in_head = 0;
        empty_lines_in_tail = 0;
    }
    /[^[:space:]]/ {
        found_not_empty_line = 1;
        empty_lines_in_tail  = 0;
    }
    /^[[:space:]]*?$/ {
        if ( found_not_empty_line ) {
            empty_lines_in_tail ++;
        } else {
            empty_lines_in_head ++;
        }
    }
    {
        lines_count ++;
    }
    END {
        print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
    }
' "$file"))

empty_lines_in_head=${result[0]}
empty_lines_in_tail=${result[1]}
lines_count=${result[2]}

if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
    echo "Removing whitespace from \"$file\""
    eval "gawk -i inplace '
        {
            if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
                print
            }
        }
    ' \"$file\""
fi
puchu
  • 3,294
  • 6
  • 38
  • 62
0

Because I was writing a bash script anyway containing some functions, I found it convenient to write those:

function strip_leading_empty_lines()
{
    while read line; do
        if [ -n "$line" ]; then
            echo "$line"
            break
        fi
    done
    cat
}

function strip_trailing_empty_lines()
{
    acc=""
    while read line; do
        acc+="$line"$'\n'
        if [ -n "$line" ]; then
            echo -n "$acc"
            acc=""
        fi
    done
}

Tilman Vogel
  • 9,337
  • 4
  • 33
  • 32
0

@mklement0 notes that @Izkata's answer has an issue when the last line doesn't end in a newline.

You can solve this problem using paste from coreutils. The following code works whether or not the last line ends in a newline.

sed '/\S/,$!d' | paste | tac | sed '/\S/,$!d' | tac

Example:

printf '\n\na\nb\nc' and printf '\n\na\nb\nc\n' piped to this code both give

a
b
c

The use of /\S/ means that lines with at least one non-white-space character are classed as not blank; all other leading and trailing lines are deleted. To delete empty lines only, use:

sed '/./,$!d' | paste | tac | sed '/./,$!d' | tac
freeB
  • 91
  • 3
0

this might not be fool-proof, but seems to kinda work :

 __=$'\n\nline 3\n\nline 5\n\nline 7\n\n'

 printf '%s' "$__" | gcat -b | gcat -n

 1  
 2  
 3       1  line 3
 4  
 5       2  line 5
 6  
 7       3  line 7
 8

mawk 'NF,EOF' RS='\n|[ \t-\r]+$'

 1       1  line 3
 2  
 3       2  line 5
 4  
 5       3  line 7
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11