2

Is there a way to remove any trailing blank lines (lines that contain only whitespace) that are at the end of file using Bash?

For example, this:

123\n\n\n12\n  \n \t \n

Should become:

123\n\n\n12\n

I know how to do that in C, using fseek() and ftruncate(), but not sure if it's possible using bash and off-the-shelf cmd-line utilities, without creating a specialized C program for it.

I have seen some question asking about removing trailing whitespace in general, such as How to remove trailing whitespace of all files recursively?, but I'm asking about doing it by truncating instead of overwriting the file (for performance reasons).

Community
  • 1
  • 1
sashoalm
  • 75,001
  • 122
  • 434
  • 781
  • 1
    http://stackoverflow.com/questions/4438306/removing-trailing-whitespace-with-sed – Ingo Bürk Nov 01 '14 at 18:32
  • 1
    @IngoBürk: end of line != end of file – Cyrus Nov 01 '14 at 18:59
  • @Cyrus Yes, that's why I didn't post it as an aswer or marked it as a duplicate. I was just pointing out that similar questions have been asked. I could've made it clearer, maybe. – Ingo Bürk Nov 01 '14 at 19:16
  • @IngoBürk Doesn't this overwrite the file? I specifically pointed out this as a difference in my question - it's about doing it by truncating instead of overwriting. – sashoalm Nov 01 '14 at 19:50
  • Any solution with standard tools you an just pipe the file contents instead of using the file, so if you don't want to write it to a file, you don't have to. – Ingo Bürk Nov 01 '14 at 19:56

2 Answers2

3

You can find trailing blank lines with tac and then truncate with dd:

#!/bin/bash
file=$1
trailing=$(tac "$file" | sed -n '/^[ \t]*$/!q; p' | wc -c)
end=$(( $(wc -c < "$file") - trailing ))
dd bs=1 seek=$end count=0 of="$file"
that other guy
  • 116,971
  • 11
  • 170
  • 194
  • @sashoalm Your question was about bash and this is bash code. It doesn't work in `sh`. – that other guy Nov 01 '14 at 19:54
  • Ok, thanks, I renamed it to a.bash and it works now. This is the first time I've encountered a script that needs an actual .bash extension. I guess I've always used only simple scripts. – sashoalm Nov 01 '14 at 19:59
  • @sashoalm Your question said to truncate the final line terminator, which would mean it's not a canonical text file by UNIX standards. Line based tools differ in how they handle these non-canonical files. – that other guy Nov 01 '14 at 20:11
0

I like @that other guy's answer very much.

But here's one other possibility that uses the fact that command substitutions remove trailing newlines, and doesn't read the file twice to compute the position where it should be trimmed.

#!/bin/bash

file=$1
tokeep=$(wc -c <<< "$(< "$file")") || exit $?
dd if=/dev/null of="$file" bs=1 seek=$tokeep

If you want to remove trailing spaces (i.e., newlines, spaces, tabs, etc.), use tr to replace whitespaces with newlines, so that the trailing ones will be discarded:

#!/bin/bash

file=$1
tokeep=$(wc -c <<< "$(tr '[[:space:]]' '\n' < "$file")") || exit $?
dd if=/dev/null of="$file" bs=1 seek=$tokeep

This preserves a single trailing newline (because the here-string <<< adds a newline). If you want to trim this trailing newline (but really, you shouldn't!), replace seek=$tokeep by seek=$((tokeep-1)) in the dd statement.

Note. The [[:space:]] character class is locale dependent. In the C and POSIX locale it corresponds to space, form-feed \f, newline \n, carriage return \r, horizontal tab \t and vertical tab \v (see man 3 isspace)1. You can craft your own set of characters too: if you only want to trim trailing newlines and tabs but preserves all the other spaces, use

tr '\t' '\n'

1 this is good since they all are one byte long, but don't use if your locale has spaces that are longer than one byte (e.g., an unbreakable space U+00A0 is UTF-8 encoded as two bytes C2 A0). If unsure what locale is in use, you should use your own characters in tr, e.g., '\t ', just to be sure they all are one byte long. If you also want to deal with two bytes characters, you should replace them with two newlines, using e.g., sed. Example with unbreakable space:

sed 's/'$'\ua0''/\n\n/g'

assuming you have a UTF-8 locale. This is a bit clunky and maybe beyond the scope of your original question.

Community
  • 1
  • 1
gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104