Unix: merge many files, while deleting first line of all files

Question

I have >100 files that I need to merge, but for each file the first line has to be removed. What is the most efficient way to do this under Unix? I suspect it's probably a command using cat and sed '1d'. All files have the same extension and are in the same folder, so we probably could use *.extension to point to the files. Many thanks!

For removing the first line, see e.g. [`tail`](http://linux.die.net/man/1/tail) (`tail -n +2 file`). — Some programmer dude, Apr 11 '12 at 09:57
@Someprogrammerdude One should use `tail -q -n +2 file`, to avoid output of headers giving file names. — Rodrigo, Oct 12 '18 at 20:07

xpapad · Accepted Answer · 2012-04-11T14:11:26.233

38

Assuming your filenames are sorted in the order you want your files appended, you can use:

ls *.extension | xargs -n 1 tail -n +2

EDIT: After Sorin and Gilles comments about the possible dangers of piping ls output, you could use:

find . -name "*.extension" | xargs -n 1 tail -n +2

edited Apr 11 '12 at 14:11

answered Apr 11 '12 at 10:00

xpapad

4,376
1
24
25

-1 for piping ls output to something, ls is not designed to do that, use find – Sorin Apr 11 '12 at 10:07
In what circumstance would this be bad Sorin? – Abdel Apr 11 '12 at 10:12
Can you give a link for possible problems with piping ls output? Thanks – xpapad Apr 11 '12 at 10:39
The above reference compares parsing the output of `ls` to doing internal string manipulation like `for x in *.txt`. It does not compare parsing the output of `ls` to parsing the output of `find`. Both are "bad" according to the same logic. – Kaz Apr 12 '12 at 00:47
The reference does mention `find` but it recommends using the GNU `find` extensions to output null terminated strings. Replacing `ls` with a plain old `find` is completely pointless. – Kaz Apr 12 '12 at 00:47
If I do something like: `find . -name "*.csv" | xargs -n 1 tail -n +2 > output.extension` then my `output.csv` file gets included in the `find . -name "*.csv"` and as a result the output file reads itself and then outputs to itself again. Is there a way to avoid this other than to make the output file not a `.csv` file? – YellowPillow Oct 31 '16 at 11:43

sorpigal · Answer 2 · 2012-04-11T12:05:36.320

20

Everyone has to be complicated. This is really easy:

tail -q -n +2 file1 file2 file3

And so on. If you have a large number of files you can load them in to an array first:

list=(file1 file2 file3)
tail -q -n +2 "${list[@]}"

All the files with a given extension in the current directory?

list=(*.extension)
tail -q -n +2 "${list[@]}"

Or just

tail -q -n +2 *.extension

edited Apr 11 '12 at 12:05

answered Apr 11 '12 at 12:00

sorpigal

25,504
8
57
75

I attempted `tail -n +2 *.extension`. The version of tail I'm using returns `tail: Can only process one file at a time.` so that explains the more complicated answers. – zr00 Jul 18 '13 at 22:33

score 6 · Answer 3 · answered Apr 11 '12 at 09:55

6

Just append each file after removing the first line.

#!/bin/bash

DEST=/tmp/out
FILES=space separated list of files

echo "" >$DEST
for FILE in $FILES
do
    sed -e'1d' $FILE >>$DEST
done

answered Apr 11 '12 at 09:55

Douglas Leeder

52,368
9
94
137

Gilles 'SO- stop being evil' · Answer 4 · 2012-04-11T11:46:29.023

tail outputs the last lines of a file. You can tell it how many lines to print, or how many lines to omit at the beginning (-n +N where N is the number of the first line to print, counting from 1 — so +2 omits one line). With GNU utilities (i.e. under Linux or Cygwin), FreeBSD or other systems that have the -q option:

tail -q -n +2 *.extension

tail prints a header before each file, and -q is not standard. If your implementation doesn't have it, or to be portable, you need to iterate over the files.

for x in *.extension; do tail -n +2 <"$x"; done

Alternatively, you can call Awk, which has a way to identify the first line of each file. This is likely to be faster if you have a lot of small files and slower if you have many large files.

awk 'FNR != 1' *.extension

score 2 · Answer 5 · answered Apr 11 '12 at 11:36

2

ls -1 file*.txt | xargs nawk 'FNR!=1'

answered Apr 11 '12 at 11:36

Vijay

65,327
90
227
319

Unix: merge many files, while deleting first line of all files

5 Answers5

Linked