Removing trailing / starting newlines with sed, awk, tr, and friends

Question

I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start; and if there are no non-empty lines after them, at the end.)

Is this possible outside of a fully-featured scripting language like Perl or Ruby? I’d prefer to do this with sed or awk if possible. Basically, any light-weight and widely available UNIX-y tool would be fine, especially one I can learn more about quickly (Perl, thus, not included.)

dogbane · Accepted Answer · 2011-09-09T09:57:55.737

70

From Useful one-line scripts for sed:

# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file

# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file

Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:

sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file

edited Sep 09 '11 at 09:57

answered Sep 09 '11 at 09:52

dogbane

266,786
75
396
414

2

According to the note at that site, the trailing-blank-line script won't work for gsed 3.02.*. This one will work: `sed -e :a -e '/^\n*$/{$d;N;ba' -e '}'` – BryanH Dec 11 '12 at 22:36
If it fails, try to do dos2unix before. This reference is such a useful complete set of sed examples. – Apr 28 '16 at 04:11
This isn't appropriate for large files – JqueryToAddNumbers Aug 18 '17 at 22:58
1

It will not remove **white spaces**. To remove leading blank lines or white spaces, use: `sed '/\S/,$!d'` – Noam Manos Feb 19 '20 at 11:49

score 16 · Answer 2 · edited May 23 '17 at 10:31

16

So I'm going to borrow part of @dogbane's answer for this, since that sed line for removing the leading blank lines is so short...

tac is part of coreutils, and reverses a file. So do it twice:

tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'

It's certainly not the most efficient, but unless you need efficiency, I find it more readable than everything else so far.

edited May 23 '17 at 10:31

Community

1
1

answered May 27 '14 at 16:27

Izkata

8,961
2
40
50

1

There's an edge case worth mentioning: if the file doesn't have a trailing `\n`, the last line won't be handled correctly: try `tac <(printf 'a\nb')`. Arguably, this behavior is flawed; also affects `tac`'s OSX equivalent, `tail -r`. – mklement0 Nov 07 '14 at 18:54
`paste` can solve this edge case. I've added an answer below showing how. – freeB Jul 02 '23 at 15:49

score 7 · Answer 3 · edited May 23 '17 at 11:47

7

As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get

echo "$(echo "$(tac "$filename")" | tac)"

which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.

edited May 23 '17 at 11:47

Community

1
1

answered Jul 07 '14 at 12:35

Jason Gross

5,928
1
26
53

2

+1 for (relative) simplicity (albeit at the expense of efficiency); OSX version (where `tac` is not available by default): `echo "$(echo "$(tail -r "$filename")" | tail -r)"` I ran tests to compare relative execution speed with a 1-million-lines file for several answers (didn't pay attention to memory use); earlier means faster: OSX 10.10: sed (dogbane) < bash (mklement0) < awk (glenn jackman) < tac (tail -r; you) Ubuntu 14.04: sed (dogbane) < tac (you) < bash (mklement0) < awk (glenn jackman) One interesting difference is that `tac` is much faster on Ubuntu than on OSX. – mklement0 Nov 07 '14 at 18:38
2

There's an edge case worth mentioning: if the file doesn't have a trailing `\n`, the last line won't be handled correctly: try `echo "$(echo "$(printf 'a\nb' | tac)" | tac)"`. This is inherent in the - arguably flawed - behavior of `tac` (and also `tail -r` on OSX) with input not ending in `\n`. – mklement0 Nov 07 '14 at 18:50
1

Using `echo "$(echo "$(cat "$filename")" | tac)" | tac` fixes the edge case that @mklement0 mentioned. – rivy Aug 20 '17 at 03:13
`paste` can also solve this edge case. I've added an answer below showing how. – freeB Jul 02 '23 at 15:54

score 7 · Answer 4 · answered Sep 09 '11 at 14:42

here's a one-pass solution in awk: it does not start printing until it sees a non-empty line and when it sees an empty line, it remembers it until the next non-empty line

awk '
    /[[:graph:]]/ {
        # a non-empty line
        # set the flag to begin printing lines
        p=1      
        # print the accumulated "interior" empty lines 
        for (i=1; i<=n; i++) print ""
        n=0
        # then print this line
        print
    }
    p && /^[[:space:]]*$/ {
        # a potentially "interior" empty line. remember it.
        n++
    }
' filename

Note, due to the mechanism I'm using to consider empty/non-empty lines (with [[:graph:]] and /^[[:space:]]*$/), interior lines with only whitespace will be truncated to become truly empty.

+1 for a single-pass, single-utility solution that is also memory-efficient (though, as noted, its behavior differs slightly from what was asked for). — mklement0, Jul 07 '14 at 14:07

score 5 · Answer 5 · answered Mar 05 '15 at 14:58

Here's an adapted sed version, which also considers "empty" those lines with just spaces and tabs on it.

sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'

It's basically the accepted answer version (considering BryanH comment), but the dot . in the first command was changed to [^[:blank:]] (anything not blank) and the \n inside the second command address was changed to [[:space:]] to allow newlines, spaces an tabs.

An alternative version, without using the POSIX classes, but your sed must support inserting \t and \n inside […]. GNU sed does, BSD sed doesn't.

sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'

Testing:

prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' 



foo

foo



prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -n l
$
 \t $
$
foo$
$
foo$
$
 \t $
$
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
foo

foo
prompt$

score 5 · Answer 6 · answered Jul 30 '20 at 17:50

5

this can be solved easily with sed -z option

sed -rz 's/^\n+//; s/\n+$/\n/g' file
Hello

Welcome to
Unix and Linux

answered Jul 30 '20 at 17:50

mug896

1,777
1
19
17

score 3 · Answer 7 · answered Sep 09 '11 at 09:42

3

using awk:

awk '{a[NR]=$0;if($0 && !s)s=NR;}
    END{e=NR;
        for(i=NR;i>1;i--) 
            if(a[i]){ e=i; break; } 
        for(i=s;i<=e;i++)
            print a[i];}' yourFile

answered Sep 09 '11 at 09:42

Kent

189,393
32
233
301

I wonder if there’s a way to reduce/refactor that to handle it in one pass? (I’m not massively familiar with awk; I can read what you wrote, but I’m not sure how to refactor it.) – ELLIOTTCABLE Sep 09 '11 at 09:49
basically this is an one-line command, the only dynamic part is 'yourFile', which is the filename you want to process. why you need reduce/refactor? – Kent Sep 09 '11 at 09:58
1

Because it’s long and complex, even if it doesn’t need any newlines? Several for loops, multiple statements; unnecessary complexity. (= – ELLIOTTCABLE Sep 09 '11 at 10:07

score 2 · Answer 8 · answered Jun 30 '17 at 16:12

For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.

sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'

It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:

sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'

I've tried a simple performance comparison with the well-known recursive script

sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'

on a 3MB file with 1MB of random blank lines around a random base64 text.

shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile

The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)

For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.

sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'

score 1 · Answer 9 · answered Jan 30 '15 at 09:00

@dogbane has a nice simple answer for removing leading empty lines. Here's a simple awk command which removes just the trailing lines. Use this with @dogbane's sed command to remove both leading and trailing blanks.

awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'

This is pretty simple in operation.

Add every line to a buffer as we read it.
For every line which contains a character, print the contents of the buffer and then clear it.

So the only things that get buffered and never displayed are any trailing blanks.

I used printf instead of print to avoid the automatic addition of a newline, since I'm using newlines to separate the lines in the buffer already.

Adi Degani · Answer 10 · 2018-11-03T09:07:43.937

1

This AWK script will do the trick:

BEGIN {
    ne=0;
}

/^[[:space:]]*$/ {
    ne++;
}

/[^[:space:]]+/ {
    for(i=0; i < ne; i++)
        print "";
    ne=0;
    print
}

The idea is simple: empty lines do not get echoed immediately. Instead, we wait till we get a non-empty line, and only then we first echo out as much empty lines as seen before it, and only then echo out the new non-empty line.

edited Nov 03 '18 at 09:07

answered Nov 03 '18 at 08:57

Adi Degani

187
2
4

This successfully removes trailing blank lines (including lines containing nothing but white space). However, it does not preserve white space in intermediate blank lines; these are truncated to empty lines. Example: `$'a\n \nb'` is transformed into `$'a\n\nb'`. – Robin A. Meade Apr 30 '20 at 05:37

score 1 · Answer 11 · answered Sep 05 '19 at 21:40

1

perl -0pe 's/^\n+|\n+(\n)$/\1/gs'

answered Sep 05 '19 at 21:40

Jan Kyu Peblik

1,435
14
20

score 1 · Answer 12 · answered Apr 30 '20 at 05:53

Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).

It is memory efficient; it does not read the entire file into memory.

awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'

The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.

If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)

If you want to remove only those trailing lines that are empty, see @AndyMortimer's answer.

Alexander Poluektov · Answer 13 · 2011-09-09T10:02:00.307

1

In bash, using cat, wc, grep, sed, tail and head:

# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
m=$(($k-$j))
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i

Man, it's definitely worth to learn "real" programming language to avoid that ugliness!

edited Sep 09 '11 at 10:02

answered Sep 09 '11 at 09:36

Alexander Poluektov

7,844
1
28
32

Well *that* part is easy enough with sed! Let me play with it, and try to get back here with a completed command. Thanks! – ELLIOTTCABLE Sep 09 '11 at 09:43
Actually, that won’t work for the last lines, because it removes *all* newlines in the grep stage, thus throwing off the count at the end. /= – ELLIOTTCABLE Sep 09 '11 at 09:45
Nope: after executing these commands you still have your original file. Second command prints all non-blanks preppenging with their line numbers. Thus you'll have number of last non-blank. – Alexander Poluektov Sep 09 '11 at 09:49
Ah! I misunderstood the operation of `grep -n` it seems. Yes! – ELLIOTTCABLE Sep 09 '11 at 09:55
(Accepted, though I used a one-line variant without any shell-variables, instead expressing a bit more with the `sed` commands.) – ELLIOTTCABLE Sep 09 '11 at 10:05
(Also, for what it’s worth; I know *many* ‘real language,’ not to mention having written a few thereof. They just weren’t appropriate for this task-space ;D) – ELLIOTTCABLE Sep 09 '11 at 10:06
That's heavy-handed: 11 invocations of external utilities, and a bunch of subshells. – mklement0 Jul 07 '14 at 14:01

score 1 · Answer 14 · answered Sep 09 '11 at 09:38

1

Using bash

$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"

answered Sep 09 '11 at 09:38

bash-o-logist

6,665
1
17
14

This only removes a single blank line from the start, and none from the end. – me_and Oct 03 '13 at 09:45
3

@me_and: While you're correct about only removing _one_ empty line from the start, this actually _does_ remove all trailing newlines, because command substitution (`$( – mklement0 Jul 07 '14 at 13:00
@mklement0: Huh, so it does. Learn a new thing every day! – me_and Jul 10 '14 at 11:18

mklement0 · Answer 15 · 2014-07-07T14:08:48.200

A bash solution.

Note: Only useful if the file is small enough to be read into memory at once.

[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"

$(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
=~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
Note that this particular regex always matches, so the command after && is always executed.
Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').

score 0 · Answer 16 · answered Nov 02 '14 at 18:07

I'd like to introduce another variant for gawk v4.1+

result=($(gawk '
    BEGIN {
        lines_count         = 0;
        empty_lines_in_head = 0;
        empty_lines_in_tail = 0;
    }
    /[^[:space:]]/ {
        found_not_empty_line = 1;
        empty_lines_in_tail  = 0;
    }
    /^[[:space:]]*?$/ {
        if ( found_not_empty_line ) {
            empty_lines_in_tail ++;
        } else {
            empty_lines_in_head ++;
        }
    }
    {
        lines_count ++;
    }
    END {
        print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
    }
' "$file"))

empty_lines_in_head=${result[0]}
empty_lines_in_tail=${result[1]}
lines_count=${result[2]}

if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
    echo "Removing whitespace from \"$file\""
    eval "gawk -i inplace '
        {
            if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
                print
            }
        }
    ' \"$file\""
fi

score 0 · Answer 17 · answered Jun 22 '21 at 11:24

Because I was writing a bash script anyway containing some functions, I found it convenient to write those:

function strip_leading_empty_lines()
{
    while read line; do
        if [ -n "$line" ]; then
            echo "$line"
            break
        fi
    done
    cat
}

function strip_trailing_empty_lines()
{
    acc=""
    while read line; do
        acc+="$line"$'\n'
        if [ -n "$line" ]; then
            echo -n "$acc"
            acc=""
        fi
    done
}

freeB · Answer 18 · 2023-07-06T19:19:06.000

@mklement0 notes that @Izkata's answer has an issue when the last line doesn't end in a newline.

You can solve this problem using paste from coreutils. The following code works whether or not the last line ends in a newline.

sed '/\S/,$!d' | paste | tac | sed '/\S/,$!d' | tac

Example:

printf '\n\na\nb\nc' and printf '\n\na\nb\nc\n' piped to this code both give

a
b
c

The use of /\S/ means that lines with at least one non-white-space character are classed as not blank; all other leading and trailing lines are deleted. To delete empty lines only, use:

sed '/./,$!d' | paste | tac | sed '/./,$!d' | tac

score 0 · Answer 19 · answered Jul 03 '23 at 00:34

this might not be fool-proof, but seems to kinda work :

 __=$'\n\nline 3\n\nline 5\n\nline 7\n\n'

 printf '%s' "$__" | gcat -b | gcat -n

 1  
 2  
 3       1  line 3
 4  
 5       2  line 5
 6  
 7       3  line 7
 8

mawk 'NF,EOF' RS='\n|[ \t-\r]+$'

 1       1  line 3
 2  
 3       2  line 5
 4  
 5       3  line 7

Removing trailing / starting newlines with sed, awk, tr, and friends

19 Answers19

Linked