4

I was wondering if there was a way to delete everything after a certain line of a text file in bash. So say there's a text file with 10 lines, and I want to delete every line after line number 4, so only the first 4 lines remained, how would I go about doing that?

John
  • 61
  • 1
  • 1
  • 6
  • post the input example – RomanPerekhrest Jul 16 '17 at 06:42
  • What do you mean input example? There would be a file with a lot of lines, named say file.txt, and I just want to trim all text after a certain line. – John Jul 16 '17 at 06:44
  • 3
    simplest is `head -4 oldfile > newfile`, then `mv newfile oldfile`. That's not "in place", but not sure what that would be necessary anyway. – cdarke Jul 16 '17 at 08:33

5 Answers5

9

You can use GNU sed:

sed -i '5,$d' file.txt

That is, 5,$ means the range line 5 until the end, and d means to delete. Only the first 4 lines will remain. The -i flag tells sed to edit the file in-place.

If you have only BSD sed, then the -i flag requires a backup file suffix:

sed -i.bak '5,$d' file.txt

As @ephemient pointed out, while this solution is simple, it's inefficient because sed will still read the input until the end of the file, which is unnecessary.

As @agc pointed out, the inverse logic of my first proposal might be actually more intuitive. That is, do not print by default (-n flag), and explicitly print range 1,4:

sed -ni.bak 1,4p file.txt

Another simple alternative, assuming that the first 4 lines are not excessively long and so they easily fit in memory, and also assuming that the 4th line ends with a newline character, you can read the first 4 lines into memory and then overwrite the file:

lines=$(head -n 4 file.txt)
echo "$lines" > file.txt
janos
  • 120,954
  • 29
  • 226
  • 236
  • Note that on *BSD (including OS X), `sed -i '5,$d' file.txt` will use `'5,$d'` as the backup suffix and `file.txt` as the expression, which is not what you want. There is no simple portable way to perform an in-place `sed` without creating backup files. – ephemient Jul 16 '17 at 06:46
  • Thanks, this answered my question. – John Jul 16 '17 at 06:46
  • The logical contrary version of this answer seems more intuitive: `sed -ni.bak '1,4p' file.txt` – agc Jul 16 '17 at 15:01
4

Minor refinements on Janos' answer, ephemient's answer, and cdark's comment:

  1. Simpler (and faster) sed code:

    sed -i 4q file
    
  2. When a filter util can't directly edit a file, there's sponge:

    head -4 file | sponge file
    
  3. Most efficient for Linux might be truncate -- coreutils sibling util to fallocate, which offers the same minimal I/O of ephemient's more portable, (but more complex), dd-based answer:

    truncate -s `head -4 file | wc -c` file
    
agc
  • 7,973
  • 2
  • 29
  • 50
3

If I don't know the line number, merely the line content (I need to know that there is nothing below the line containing 'knowntext' that I want to preserve.), then I use.

sed -i '/knowntext/,$d' inputfilename

to directly alter the file, or to be cautious

sed '/knowntext/,$d' inputfilename > outputfilename

where inputfilename is unaltered, and outputfilename contains the truncated version of the input. I am not competent to comment on the efficiency of this, but I know that files of 20kB or so are dealt with faster than I can blink.

2

The sed method that @janos is simple but inefficient. It will read every line from the original file, even ones it could ignore (although that can be fixed using 4q), and -i actually creates a new file (which it renames to replace the original file). And there's the annoying bit where you need to use sed -i '5,$d' file.txt with GNU sed but sed -i '' '5,$d' file.txt with BSD sed in order to remove the existing file instead of leaving a backup.

Another method that performs less I/O:

dd bs=1 count=0 if=/dev/null of=file.txt \
    seek=$(grep -b ^ file.txt | tail -n+5 | head -n1 | cut -d: -f1)
  • grep -b ^ file.txt prints out byte offsets on each line, e.g.

    $ yes | grep -b ^
    0:y
    2:y
    4:y
    ...
    
  • tail -n+5 skips the first 4 lines, outputting the 5th and subsequent lines

  • head -n1 takes only the next line (e.g. only the 5th line)

    After head reads the one line, it will exit. This causes tail to exit because it has nowhere to output to anymore. This causes grep to exit for the same reason. Thus, the rest of file.txt does not need to be examined.

  • cut -d: -f1 takes only the first part before the : (the byte offset)

  • dd bs=1 count=0 if=/dev/null of=file.txt seek=N

    • using a block size of 1 byte, seek to block N of file.txt

    • copy 0 blocks of size 1 byte from /dev/null to file.txt

    • truncate file.txt here (because conv=notrunc was not given)

    In short, this removes all data on the 5th and subsequent lines from file.txt.

    On Linux there is a command named fallocate which can similarly extend or truncate a file, but that's not portable.

UNIX filesystems support efficiently truncating files in-place, and these commands are portable. The downside is that it's more work to write out.

(Also, dd will print some unnecessary stats to stderr, and will exit with an error if the file has fewer than 5 lines, although in that case it will leave the existing file contents in place, so the behavior is still correct. Those can be addressed also, if needed.)

agc
  • 7,973
  • 2
  • 29
  • 50
ephemient
  • 198,619
  • 38
  • 280
  • 391
  • 1
    A simpler way than `seek=$(grep -b ^ file.txt | tail -n+5 | head -n1 | cut -d: -f1)` is `seek=$(head -n4 file.txt | wc -c | tr -d ' ')` – janos Jul 16 '17 at 09:42
0

Using GNU awk (v. 4.1.0+, see here). First we create a test file (NOTICE THE DISCLAIMER):

$ seq 1 10 > file     # THIS WILL OVERWRITE FILE NAMED file WITH TEST DATA

Then the code and validation (WILL MODIFY THE ORIGINAL FILE NAMED file):

$ awk -i inplace 'NR<=4' file
$ cat file
1
2
3
4

Explained:

$ awk -i inplace '   # edit is targetted to the original file (try without -i ...)
NR<=4                # output first 4 records
' file               # file

You could also exit on line NR==5 which would be quicker if you redirected the output of the program to a new file (remove # for action) which would be the same as head -4 file > new_file:

$ awk 'NR==5{exit}1' file  # > new_file

When testing, don't forget the seq part first.

James Brown
  • 36,089
  • 7
  • 43
  • 59