34

I have >100 files that I need to merge, but for each file the first line has to be removed. What is the most efficient way to do this under Unix? I suspect it's probably a command using cat and sed '1d'. All files have the same extension and are in the same folder, so we probably could use *.extension to point to the files. Many thanks!

Abdel
  • 5,826
  • 12
  • 56
  • 77

5 Answers5

38

Assuming your filenames are sorted in the order you want your files appended, you can use:

ls *.extension | xargs -n 1 tail -n +2

EDIT: After Sorin and Gilles comments about the possible dangers of piping ls output, you could use:

find . -name "*.extension" | xargs -n 1 tail -n +2
xpapad
  • 4,376
  • 1
  • 24
  • 25
  • -1 for piping ls output to something, ls is not designed to do that, use find – Sorin Apr 11 '12 at 10:07
  • In what circumstance would this be bad Sorin? – Abdel Apr 11 '12 at 10:12
  • Can you give a link for possible problems with piping ls output? Thanks – xpapad Apr 11 '12 at 10:39
  • The above reference compares parsing the output of `ls` to doing internal string manipulation like `for x in *.txt`. It does not compare parsing the output of `ls` to parsing the output of `find`. Both are "bad" according to the same logic. – Kaz Apr 12 '12 at 00:47
  • The reference does mention `find` but it recommends using the GNU `find` extensions to output null terminated strings. Replacing `ls` with a plain old `find` is completely pointless. – Kaz Apr 12 '12 at 00:47
  • If I do something like: `find . -name "*.csv" | xargs -n 1 tail -n +2 > output.extension` then my `output.csv` file gets included in the `find . -name "*.csv"` and as a result the output file reads itself and then outputs to itself again. Is there a way to avoid this other than to make the output file not a `.csv` file? – YellowPillow Oct 31 '16 at 11:43
20

Everyone has to be complicated. This is really easy:

tail -q -n +2 file1 file2 file3

And so on. If you have a large number of files you can load them in to an array first:

list=(file1 file2 file3)
tail -q -n +2 "${list[@]}"

All the files with a given extension in the current directory?

list=(*.extension)
tail -q -n +2 "${list[@]}"

Or just

tail -q -n +2 *.extension
sorpigal
  • 25,504
  • 8
  • 57
  • 75
  • I attempted `tail -n +2 *.extension`. The version of tail I'm using returns `tail: Can only process one file at a time.` so that explains the more complicated answers. – zr00 Jul 18 '13 at 22:33
6

Just append each file after removing the first line.

#!/bin/bash

DEST=/tmp/out
FILES=space separated list of files

echo "" >$DEST
for FILE in $FILES
do
    sed -e'1d' $FILE >>$DEST
done
Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
3

tail outputs the last lines of a file. You can tell it how many lines to print, or how many lines to omit at the beginning (-n +N where N is the number of the first line to print, counting from 1 — so +2 omits one line). With GNU utilities (i.e. under Linux or Cygwin), FreeBSD or other systems that have the -q option:

tail -q -n +2 *.extension

tail prints a header before each file, and -q is not standard. If your implementation doesn't have it, or to be portable, you need to iterate over the files.

for x in *.extension; do tail -n +2 <"$x"; done

Alternatively, you can call Awk, which has a way to identify the first line of each file. This is likely to be faster if you have a lot of small files and slower if you have many large files.

awk 'FNR != 1' *.extension
Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
2
ls -1 file*.txt | xargs nawk 'FNR!=1'
Vijay
  • 65,327
  • 90
  • 227
  • 319