I have >100 files that I need to merge, but for each file the first line has to be removed. What is the most efficient way to do this under Unix? I suspect it's probably a command using cat and sed '1d'. All files have the same extension and are in the same folder, so we probably could use *.extension to point to the files. Many thanks!
-
6For removing the first line, see e.g. [`tail`](http://linux.die.net/man/1/tail) (`tail -n +2 file`). – Some programmer dude Apr 11 '12 at 09:57
-
1@Someprogrammerdude One should use `tail -q -n +2 file`, to avoid output of headers giving file names. – Rodrigo Oct 12 '18 at 20:07
5 Answers
Assuming your filenames are sorted in the order you want your files appended, you can use:
ls *.extension | xargs -n 1 tail -n +2
EDIT: After Sorin and Gilles comments about the possible dangers of piping ls output, you could use:
find . -name "*.extension" | xargs -n 1 tail -n +2

- 4,376
- 1
- 24
- 25
-
-1 for piping ls output to something, ls is not designed to do that, use find – Sorin Apr 11 '12 at 10:07
-
-
Can you give a link for possible problems with piping ls output? Thanks – xpapad Apr 11 '12 at 10:39
-
The above reference compares parsing the output of `ls` to doing internal string manipulation like `for x in *.txt`. It does not compare parsing the output of `ls` to parsing the output of `find`. Both are "bad" according to the same logic. – Kaz Apr 12 '12 at 00:47
-
The reference does mention `find` but it recommends using the GNU `find` extensions to output null terminated strings. Replacing `ls` with a plain old `find` is completely pointless. – Kaz Apr 12 '12 at 00:47
-
If I do something like: `find . -name "*.csv" | xargs -n 1 tail -n +2 > output.extension` then my `output.csv` file gets included in the `find . -name "*.csv"` and as a result the output file reads itself and then outputs to itself again. Is there a way to avoid this other than to make the output file not a `.csv` file? – YellowPillow Oct 31 '16 at 11:43
Everyone has to be complicated. This is really easy:
tail -q -n +2 file1 file2 file3
And so on. If you have a large number of files you can load them in to an array first:
list=(file1 file2 file3)
tail -q -n +2 "${list[@]}"
All the files with a given extension in the current directory?
list=(*.extension)
tail -q -n +2 "${list[@]}"
Or just
tail -q -n +2 *.extension

- 25,504
- 8
- 57
- 75
-
I attempted `tail -n +2 *.extension`. The version of tail I'm using returns `tail: Can only process one file at a time.` so that explains the more complicated answers. – zr00 Jul 18 '13 at 22:33
Just append each file after removing the first line.
#!/bin/bash
DEST=/tmp/out
FILES=space separated list of files
echo "" >$DEST
for FILE in $FILES
do
sed -e'1d' $FILE >>$DEST
done

- 52,368
- 9
- 94
- 137
tail
outputs the last lines of a file. You can tell it how many lines to print, or how many lines to omit at the beginning (-n +N
where N is the number of the first line to print, counting from 1 — so +2
omits one line). With GNU utilities (i.e. under Linux or Cygwin), FreeBSD or other systems that have the -q
option:
tail -q -n +2 *.extension
tail
prints a header before each file, and -q
is not standard. If your implementation doesn't have it, or to be portable, you need to iterate over the files.
for x in *.extension; do tail -n +2 <"$x"; done
Alternatively, you can call Awk, which has a way to identify the first line of each file. This is likely to be faster if you have a lot of small files and slower if you have many large files.
awk 'FNR != 1' *.extension

- 104,111
- 38
- 209
- 254