765

I need to repeatedly remove the first line from a huge text file using a bash script.

Right now I am using sed -i -e "1d" $FILE - but it takes around a minute to do the deletion.

Is there a more efficient way to accomplish this?

Peter Coulton
  • 54,789
  • 12
  • 54
  • 72
Brent
  • 16,259
  • 12
  • 42
  • 42

20 Answers20

1354

Try tail:

tail -n +2 "$FILE"

-n x: Just print the last x lines. tail -n 5 would give you the last 5 lines of the input. The + sign kind of inverts the argument and make tail print anything but the first x-1 lines. tail -n +1 would print the whole file, tail -n +2 everything but the first line, etc.

GNU tail is much faster than sed. tail is also available on BSD and the -n +2 flag is consistent across both tools. Check the FreeBSD or OS X man pages for more.

The BSD version can be much slower than sed, though. I wonder how they managed that; tail should just read a file line by line while sed does pretty complex operations involving interpreting a script, applying regular expressions and the like.

Note: You may be tempted to use

# THIS WILL GIVE YOU AN EMPTY FILE!
tail -n +2 "$FILE" > "$FILE"

but this will give you an empty file. The reason is that the redirection (>) happens before tail is invoked by the shell:

  1. Shell truncates file $FILE
  2. Shell creates a new process for tail
  3. Shell redirects stdout of the tail process to $FILE
  4. tail reads from the now empty $FILE

If you want to remove the first line inside the file, you should use:

tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"

The && will make sure that the file doesn't get overwritten when there is a problem.

kirelagin
  • 13,248
  • 2
  • 42
  • 57
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • 3
    According to this http://ss64.com/bash/tail.html the typical buffer defaults to 32k when using BSD 'tail' with the `-r` option. Maybe there's a buffer setting somewhere in the system? Or `-n` is a 32-bit signed number? – Yzmir Ramirez Nov 10 '11 at 00:49
  • hmm, just worked for me on a 92 M file to remove the first 400k+ lines. – Eddie Feb 14 '13 at 15:45
  • 43
    @Eddie: user869097 said it doesn't work when a *single* line is 15Mb or more. As long as the lines are shorter, `tail` will work for any file size. – Aaron Digulla Feb 14 '13 at 16:21
  • 2
    oops. thanks for correcting me. WOw, 15mb line.. I can't even imagine such a case. – Eddie Feb 15 '13 at 15:11
  • @Eddie I sometimes see them on programs that churn out whole database as XML output, yet not inserting newlines at crucial places. – syockit Feb 11 '14 at 06:57
  • 2
    @Dreampuf: `sed` has an internal buffer for the current line while `tail` can get away by just remembering the offset of the N last newline characters (note that I didn't actually look at the sources). – Aaron Digulla Feb 11 '14 at 08:28
  • 1
    It is better if you write output to a file : `tail -n +2 "$FILE" > newfile` – M Rostami Sep 16 '14 at 10:12
  • 1
    Why is tail faster than sed in this case? – CMCDragonkai Jan 28 '16 at 12:04
  • @CMCDragonkai Tail is a tool specialized for this task. Sed is a general purpose tool. It will create an internal data structure, apply the operations to every line (`1d` just matches the first line but I'm not sure that `sed` optimizes this case, for example). – Aaron Digulla Jan 29 '16 at 14:37
  • tail is MUCH SLOWER than sed. tail needs 13.5s, sed needs 0.85s. My file has ~1M lines, ~100MB. MacBook Air 2013 with SSD. – jcsahnwaldt Reinstate Monica Feb 01 '16 at 16:15
  • @JonaChristopherSahnwaldt Intersting. Did you run both several times to rule out caches and the like? – Aaron Digulla Feb 01 '16 at 17:15
  • @AaronDigulla I ran both twice. I could paste the results in a chat window or so. Don't know how to do this here... – jcsahnwaldt Reinstate Monica Feb 01 '16 at 17:16
  • @JonaChristopherSahnwaldt And the resulting files are the same? I'm not sure how fast your SSD is but reading and writing 100MB to a file should already take around 1 second. – Aaron Digulla Feb 01 '16 at 17:17
  • @AaronDigulla Yes, they are the same. – jcsahnwaldt Reinstate Monica Feb 01 '16 at 17:19
  • @AaronDigulla https://docs.google.com/document/d/10HcDvQZ5d5ZN7B7TuZCfF-_vGCbZ4XpcfP8GW4xxojU/edit – jcsahnwaldt Reinstate Monica Feb 01 '16 at 18:08
  • @JonaChristopherSahnwaldt I'm very, very surprised by these numbers. It's like Windows Word printing faster than `echo | lpr`. I don't have the time to debug `tail`, so I don't know why it's slower in your case. My gut feeling is that it's the long lines but I don't know. – Aaron Digulla Feb 02 '16 at 10:12
  • @AaronDigulla The lines aren't long. 100 bytes on average. – jcsahnwaldt Reinstate Monica Feb 02 '16 at 10:43
  • @AaronDigulla: How fast or slow are sed / tail on your machine? – jcsahnwaldt Reinstate Monica Feb 04 '16 at 14:10
  • @JonaChristopherSahnwaldt On my computer (Windows 8, Cygwin, sed 4.2.2, tail 8.24). 100MB Text, short lines (<80 characters). `time cat sample.txt > /dev/null` takes 0.06s (just IO from the cache). `time sed -e "1d" sample.txt > /dev/null` takes 1.12s, `time tail -n +2 sample.txt > /dev/null` takes 0.22s. `sed` is roughly 6 times slower than `tail`. – Aaron Digulla Feb 09 '16 at 13:01
  • 13
    I was going to concur with @JonaChristopherSahnwaldt -- tail is much, much slower than the sed variant, by an order of magnitude. I'm testing it on a file of 500,000K lines (no more than 50 chars per line). However, I then realized I was using the FreeBSD version of tail (which comes with OS X by default). When I switched to GNU tail, the tail call was 10 times faster than the sed call (and the GNU sed call, too). AaronDigulla is correct here, if you're using GNU. – dancow Aug 18 '16 at 20:59
  • 1
    The nice thing about `sed` is that you can use it to edit files in place, which you can not do with `tail` ( As far as I am aware of. Please correct me if I am wrong). If you would like to delete the first line in all files in a directory, you could do something like this `sed -i "1d" *`. I guess you could also automate `tail` by using it in combination with `find` or by making a script, but I am not sure which one performs better. I know the OP mentioned they were using `-i`, but I thought this might help clarify its use. – James Mchugh Jul 21 '17 at 12:30
  • Is there a way to use tail like this on multiple files at the time? I have several files, 1.txt, 2.txt and so on that I would like to perform this operation on, and I want the output to end up in either 1.txt, 2.txt or 1.fixed, 2.fixed or something like that. – d-b Apr 13 '18 at 09:09
  • @d-b No. Use a loop. – Aaron Digulla May 17 '18 at 07:39
329

With sed, the pattern '1d' will delete the first line. Additionally, the -i flag can be used to update the file "in place". 1

sed -i '1d' filename

1 sed -i automatically creates a temporary file with the desired changes, and then replaces the original file.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
amit
  • 5,079
  • 2
  • 17
  • 12
  • 1
    I get error: `unterminated transform source string` – Daniel Kobe Dec 01 '15 at 04:16
  • 1
    sed was much faster when I timed the operation. – wbg Dec 19 '16 at 21:40
  • 16
    this works every time and should really be the top answer! – xtheking Mar 28 '17 at 13:23
  • 13
    Just to remember, Mac requires a suffix to be provided when using sed with in-place edits. So run the above with -i.bak – mjp May 10 '17 at 18:00
  • 12
    Just a note - to remove several lines use `sed -i '1,2d' filename` – The Godfather May 24 '18 at 09:08
  • 8
    This version is really much more readable, and more universal, than `tail -n +2`. Not sure why it isn't the top answer. – Luke Davis Jun 26 '18 at 19:43
  • Besides [the significant time reduction](https://askubuntu.com/a/862839/136964) of (GNU) `tail` compared to `sed`, it should be noted that despite the `-i` option, `sed` needs to create a copy of the file anyway, so this solution won't be more helpful than `tail` when facing limited disk space issues. – Skippy le Grand Gourou Feb 06 '19 at 14:34
  • @LukeDavis because the question is asking for something faster than this. – OrangeDog Jul 03 '19 at 15:50
  • Please add pipe example, to avoid *"sed: no input files"* – Peter Krauss Nov 04 '19 at 13:22
  • pipe example: cat filename | sed '1,3d' —— delete the [1,3] (1st to 3rd) lines of the stdin. – DataAlchemist Apr 03 '20 at 14:54
  • 3
    Works on Ubuntu (GNU) but for OS X (BSD) I had to change it to `sed -i '' '1d' filename`. Per https://stackoverflow.com/questions/16745988/sed-command-with-i-option-in-place-editing-works-fine-on-ubuntu-but-not-mac – Ahmad Abdelghany May 27 '20 at 10:40
  • Regarding the "pipe example": **DO NOT** try to expand it to `cat filename | sed '1,3d' > filename`, this will empty your file before the pipeline even starts :-) Use a different filename for the output, then move it: `cat filename | sed '1,3d' > file.tmp && mv -i file.tmp filename` – alexis Oct 18 '22 at 09:21
83

For those who are on SunOS which is non-GNU, the following code will help:

sed '1d' test.dat > tmp.dat 
Nasri Najib
  • 1,261
  • 11
  • 6
  • 49
    Interesting demographic – captain Jul 15 '15 at 01:39
  • 1
    @ValerioBozz It's kinda weird revisiting this comment after almost a decade lol. I don't even remember it. But I was just pointing out that this answer is for SunOS which was last released in 1998. Very few if any use it – captain Mar 18 '23 at 19:14
22

You can easily do this with:

cat filename | sed 1d > filename_without_first_line

on the command line; or to remove the first line of a file permanently, use the in-place mode of sed with the -i flag:

sed -i 1d <filename>
Ali
  • 2,228
  • 1
  • 21
  • 21
Ingo Baab
  • 528
  • 4
  • 7
  • 3
    The `-i` option technically takes an argument specifying the file suffix to use when making a backup of the file (e.g. `sed -I .bak 1d filename` creates a copy called `filename.bak` of the original file with the first line intact). While GNU sed lets you specify `-i` without an argument to skip the backup, BSD sed, as found on macOS, requires an empty string argument as a separate shell word (e.g. `sed -i '' ...`). – Mark Reed Dec 24 '20 at 21:21
16

The sponge util avoids the need for juggling a temp file:

tail -n +2 "$FILE" | sponge "$FILE"
agc
  • 7,973
  • 2
  • 29
  • 50
  • 1
    `sponge` is indeed much cleaner and more robust than the accepted solution (`tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"`) – Jealie Dec 19 '17 at 00:25
  • This is the only solution that worked for me to change a system file (on a Debian docker image). Other solutions failed due to "Device or resource busy" error when attempting to write the file. – FedFranz Jan 22 '18 at 15:37
  • 2
    But does `sponge` buffer the whole file in memory? That won't work if it's hundreds of GB. – OrangeDog Jul 03 '19 at 16:05
  • 1
    @OrangeDog, So long as the file system can store it, `sponge` will soak it up, since it uses a */tmp* file as an intermediate step, which is then used to replace the original afterward. – agc Jul 03 '19 at 20:52
15

No, that's about as efficient as you're going to get. You could write a C program which could do the job a little faster (less startup time and processing arguments) but it will probably tend towards the same speed as sed as files get large (and I assume they're large if it's taking a minute).

But your question suffers from the same problem as so many others in that it pre-supposes the solution. If you were to tell us in detail what you're trying to do rather then how, we may be able to suggest a better option.

For example, if this is a file A that some other program B processes, one solution would be to not strip off the first line, but modify program B to process it differently.

Let's say all your programs append to this file A and program B currently reads and processes the first line before deleting it.

You could re-engineer program B so that it didn't try to delete the first line but maintains a persistent (probably file-based) offset into the file A so that, next time it runs, it could seek to that offset, process the line there, and update the offset.

Then, at a quiet time (midnight?), it could do special processing of file A to delete all lines currently processed and set the offset back to 0.

It will certainly be faster for a program to open and seek a file rather than open and rewrite. This discussion assumes you have control over program B, of course. I don't know if that's the case but there may be other possible solutions if you provide further information.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • I think the OP is trying to achieve what made me find this question. I have 10 CSV files with 500k lines in each. Every file has the same header row as the first line. I am cat:ing these files into one file and then importing them into a DB letting the DB create column names from the first line. Obviously I don't want that line repeated in file 2-10. – d-b Apr 12 '18 at 13:21
  • 4
    @d-b In that case, `awk FNR-1 *.csv` is probably faster. – jinawee Jan 29 '19 at 09:50
13

If you want to modify the file in place, you could always use the original ed instead of its streaming successor sed:

ed "$FILE" <<<$'1d\nwq\n'

The ed command was the original UNIX text editor, before there were even full-screen terminals, much less graphical workstations. The ex editor, best known as what you're using when typing at the colon prompt in vi, is an extended version of ed, so many of the same commands work. While ed is meant to be used interactively, it can also be used in batch mode by sending a string of commands to it, which is what this solution does.

The sequence <<<$'1d\nwq\n' takes advantage of modern shells' support for here-strings (<<<) and ANSI quotes ($'...') to feed input to the ed command consisting of two lines: 1d, which deletes line 1, and then wq, which writes the file back out to disk and then quits the editing session.

Mark Reed
  • 91,912
  • 16
  • 138
  • 175
11

You can edit the files in place: Just use perl's -i flag, like this:

perl -ni -e 'print unless $. == 1' filename.txt

This makes the first line disappear, as you ask. Perl will need to read and copy the entire file, but it arranges for the output to be saved under the name of the original file.

alexis
  • 48,685
  • 16
  • 101
  • 161
11

As Pax said, you probably aren't going to get any faster than this. The reason is that there are almost no filesystems that support truncating from the beginning of the file so this is going to be an O(n) operation where n is the size of the file. What you can do much faster though is overwrite the first line with the same number of bytes (maybe with spaces or a comment) which might work for you depending on exactly what you are trying to do (what is that by the way?).

Robert Gamble
  • 106,424
  • 25
  • 145
  • 137
  • 1
    Re *"...almost no filesystems that support truncating..."*: that's interesting; please consider including a parenthetical note naming such a filesystem. – agc Mar 06 '19 at 11:23
  • 5
    @agc: irrelevant now, but my first job in the '70s was with Quadex, a small startup (now gone, and unrelated to the two companies now using that name). They had a filesystem which allowed adding _or_ removing at either beginning or end of a file, used mostly to implement editing in less than 3KB by putting above-window and below-window in files. It had no name of its own, it was just part of QMOS, the Quadex Multiuser Operating System. ('Multi' was usually 2-3 on an LSI-11/02 with under 64KB RAM and usually a few RX01-type 8" floppy disks each 250KB.) :-) – dave_thompson_085 Nov 24 '19 at 07:15
7

should show the lines except the first line :

cat textfile.txt | tail -n +2
serup
  • 3,676
  • 2
  • 30
  • 34
6

Could use vim to do this:

vim -u NONE +'1d' +'wq!' /tmp/test.txt

This should be faster, since vim won't read whole file when process.

Hongbo Liu
  • 2,818
  • 1
  • 24
  • 18
  • May need to quote the `+wq!` if your shell is bash. Probably not since the `!` is not at the beginning of a word, but getting in the habit of quoting things is probably good all around. (And if you're going for super-efficiency by not quoting unnecessarily, you don't need the quotes around the `1d` either.) – Mark Reed May 15 '18 at 18:52
  • 1
    vim **does** need to read the whole file. In fact if the file is larger than memory, as asked in this Q, vim reads the whole file and writes it (or most of it) to a temp file, and after editting writes it all back (to the permanent file). I don't know how you think it could possibly work _without_ this. – dave_thompson_085 Nov 24 '19 at 07:03
5

How about using csplit?

man csplit
csplit -k file 1 '{1}'
Shahbaz
  • 46,337
  • 19
  • 116
  • 182
  • This syntax would also work, but only generate two output files instead of three: `csplit file /^.*$/1`. Or more simply: `csplit file //1`. Or even more simply: `csplit file 2`. – Marco Roy Jan 21 '16 at 23:39
4

This one liner will do:

echo "$(tail -n +2 "$FILE")" > "$FILE"

It works, since tail is executed prior to echo and then the file is unlocked, hence no need for a temp file.

egors
  • 55
  • 5
1

Since it sounds like I can't speed up the deletion, I think a good approach might be to process the file in batches like this:

While file1 not empty
  file2 = head -n1000 file1
  process file2
  sed -i -e "1000d" file1
end

The drawback of this is that if the program gets killed in the middle (or if there's some bad sql in there - causing the "process" part to die or lock-up), there will be lines that are either skipped, or processed twice.

(file1 contains lines of sql code)

Brent
  • 16,259
  • 12
  • 42
  • 42
1
tail +2 path/to/your/file

works for me, no need to specify the -n flag. For reasons, see Aaron's answer.

zabop
  • 6,750
  • 3
  • 39
  • 84
1

You can use the sed command to delete arbitrary lines by line number

# create multi line txt file
echo """1. first
2. second
3. third""" > file.txt

deleting lines and printing to stdout

$ sed '1d' file.txt 
2. second
3. third

$ sed '2d' file.txt 
1. first
3. third

$ sed '3d' file.txt 
1. first
2. second

# delete multi lines
$ sed '1,2d' file.txt 
3. third

# delete the last line
sed '$d' file.txt 
1. first
2. second

use the -i option to edit the file in-place

$ cat file.txt 
1. first
2. second
3. third

$ sed -i '1d' file.txt

$cat file.txt 
2. second
3. third
aidanmelen
  • 6,194
  • 1
  • 23
  • 24
0

If what you are looking to do is recover after failure, you could just build up a file that has what you've done so far.

if [[ -f $tmpf ]] ; then
    rm -f $tmpf
fi
cat $srcf |
    while read line ; do
        # process line
        echo "$line" >> $tmpf
    done
bfontaine
  • 18,169
  • 13
  • 73
  • 107
Tim
  • 17
  • 1
0

Based on 3 other answers, I came up with this syntax that works perfectly in my Mac OSx bash shell:

line=$(head -n1 list.txt && echo "$(tail -n +2 list.txt)" > list.txt)

Test case:

~> printf "Line #%2d\n" {1..3} > list.txt
~> cat list.txt
Line # 1
Line # 2
Line # 3
~> line=$(head -n1 list.txt && echo "$(tail -n +2 list.txt)" > list.txt)
~> echo $line
Line # 1
~> cat list.txt
Line # 2
Line # 3
Murilo Perrone
  • 452
  • 4
  • 7
0

Also check these ways :

mapfile -t lines < 1.txt && printf "%s\n" "${lines[@]:1}" > new.txt

#OR

awk 'NR>1' old.txt > new.txt

#OR

cut -d $'\n' -f 2- old.txt > new.txt
Freeman
  • 9,464
  • 7
  • 35
  • 58
-1

Would using tail on N-1 lines and directing that into a file, followed by removing the old file, and renaming the new file to the old name do the job?

If i were doing this programatically, i would read through the file, and remember the file offset, after reading each line, so i could seek back to that position to read the file with one less line in it.

EvilTeach
  • 28,120
  • 21
  • 85
  • 141
  • The first solution is essentially identical to that Brent is doing now. I don't understand your programmatic approach, only the first line needs to be deleted, you would just read and discard the first line and copy the rest to another file which is again the same as the sed and tail approaches. – Robert Gamble Dec 04 '08 at 03:56
  • The second solution has the implication that the file is not shrunk by the first line each time. The program simply processes it, as if it had been shrunk, but starting at the next line each time – EvilTeach Dec 04 '08 at 14:27
  • I still don't understand what you second solution is. – Robert Gamble Dec 04 '08 at 19:21