3

I have many text files with only one line float value in one folder and I would like to concatenate them in bash in order for example: file_1.txt, file_2.txt ...file_N.txt. I would like to have them in one txt file in the order from 1 to N. Could someone please help me ? Here is the code I have but it just concatenates them in random manner. Thank you

for file in *.txt
do 
  cat ${file} >>  output.txt  
done 
jrjc
  • 21,103
  • 9
  • 64
  • 78
user3612121
  • 102
  • 9
  • Assuming the files sort alphabetically into the order you want that should be working. – Etan Reisner Jul 24 '14 at 13:06
  • possible duplicate of [How to merge files in bash in alphabetical order](http://stackoverflow.com/questions/7176572/how-to-merge-files-in-bash-in-alphabetical-order) – Ken Jul 24 '14 at 13:06

7 Answers7

4

As much as I recommend against parsing the output of ls, here we go.

ls has a "version sort" option that will sort numbered files like you want. See below for a demo.

To concatenate, you want:

ls -v file*.txt | xargs cat > output
$ touch file{1..20}.txt
$ ls
file1.txt   file12.txt  file15.txt  file18.txt  file20.txt  file5.txt  file8.txt
file10.txt  file13.txt  file16.txt  file19.txt  file3.txt   file6.txt  file9.txt
file11.txt  file14.txt  file17.txt  file2.txt   file4.txt   file7.txt
$ ls -1
file1.txt
file10.txt
file11.txt
file12.txt
file13.txt
file14.txt
file15.txt
file16.txt
file17.txt
file18.txt
file19.txt
file2.txt
file20.txt
file3.txt
file4.txt
file5.txt
file6.txt
file7.txt
file8.txt
file9.txt
$ ls -1v
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt
file6.txt
file7.txt
file8.txt
file9.txt
file10.txt
file11.txt
file12.txt
file13.txt
file14.txt
file15.txt
file16.txt
file17.txt
file18.txt
file19.txt
file20.txt
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • This is probably a GNU ls option. To do this without ls: `printf "%s\n" file*.txt | sort -V | xargs cat > output` – glenn jackman Jul 24 '14 at 13:19
  • 1
    same for `sort -V`, must be a GNU option too. – jrjc Jul 24 '14 at 13:20
  • @glennjackman: I have seen it before and i saw it today. You have a knack for coming up with simple and elegant solution. :) +1 – Technext Jul 24 '14 at 13:30
  • To summarize the portability aspects: `ls -v` is a GNU extension, as is `sort -V`. OSX, unlike in most other cases, actually _also_ uses _GNU_ `sort`, but a version that is _too old_ (`5.93`). (`ls -v` exists on OSX, but means something different - curiously it doesn't exist on *BSD systems). Not an issue in _this_ case, but worth noting in general: this solution will break with filenames that have embedded spaces. – mklement0 Jul 24 '14 at 18:19
2
for file in *.txt
do 
  cat ${file} >>  output.txt  
done 

This works for me as well as :

for file in *.txt
do 
  cat $file >>  output.txt  
done

You don't need {}

But the simpler is still :

cat file*.txt > output.txt

So if you have more than 9 files as suggested in the comment, you can do one of the following :

files=$(ls file*txt | sort -t"_" -k2g)
files=$(find . -name "file*txt" | sort -t "_" -k2g)
files=$(printf "%s\n" file_*.txt | sort -k1.6n) # Thanks to glenn jackman

and then:

cat $files

or

cat $(find . -name "file*txt" | sort -t "_" -k2g)

Best is still to number your files correctly, so file_01.txt if you have less than 100 files, et file_001.txt if less than 1000, an so on.


example :

ls file*txt
file_1.txt  file_2.txt  file_3.txt  file_4.txt  file_5.txt  file_10.txt

They contain only their corresponding number.

$ cat $files
1
2
3
4
5
10
jrjc
  • 21,103
  • 9
  • 64
  • 78
  • 4
    This is only valid for N in {1..9}. Once N exceeds 9, the files will not be ordered correctly. – Henk Langeveld Jul 24 '14 at 13:06
  • 2
    Another way to use sort: if you know the digits start at the 6th character of the filename: `printf "%s\n" file_*.txt | sort -k1.6n` – glenn jackman Jul 24 '14 at 13:25
  • `printf "%s\n" file_*.txt` is the preferable form: `ls file_*.txt` does the same, but needlessly invokes an external executable. `find . -name "file_*txt"`, aside from also invoking an external executable, potentially does something _different_, because it processes the _entire subtree_ (i.e., files in _subdirectories_ could be picked up, too; add `-maxdepth 1` to avoid that). – mklement0 Jul 24 '14 at 18:45
  • It works in _this_ case, but note that if you want to restrict sorting to a _single_ field, you must specify that field index _twice_, e.g.: `-k2,2g` - otherwise, that field and _the rest of the line_ act as the sort key. Also, unless your numbers are not decimal or have a `+` prefix or are in exponential notation, use `n` not `g` for numerical sorting (avoids rounding errors, is faster - see http://goo.gl/X6KeE). Thus, the sort keys should be: `-k2,2n` or `-k1.6,1n`. Not an issue in _this_ case, but worth noting in general: this solution will break with filenames that have embedded spaces. – mklement0 Jul 24 '14 at 18:51
1

Use this:

find . -type f -name "file*.txt" | sort -V | xargs cat -- >final_file

If the files are numbered, then sorting doesn't happen in the natural way that we human expect. For that to happen, you will have to use -V option with sort command.

Technext
  • 7,887
  • 9
  • 48
  • 76
  • You should probably add `-maxdepth 1` to the `find` command to avoid potentially matching files in _subdirectories_ also. That said, `printf '%s\n' file*.txt` is probably easier. Note that `-V` is a _GNU_ `sort` extension. Not an issue in _this_ case, but worth noting in general: will break with filenames that have embedded spaces. – mklement0 Jul 24 '14 at 18:09
1

As others have pointed out, if you have files file_1, file_2, file_3... file_123283, the internal BASH sorting of these files will put file_11 before file_2 because they're sorted by text and not numerically.

You can use sort to get the order you want. Assuming that your files are file_#...

cat $(ls -1 file_* | sort -t_ -k2,2n)
  • The ls -1 lists your files out on one per line.
  • sort -t_ says to break the sorting fields down by underscores. This makes the second sorting field the numeric part of the file name.
  • -k2,2n says to sort by the second field numerically.

Then, you concatenate out all of the files together.

One issue is that you may end up filling up your command line buffer if you have a whole lot of files. Before cat can get the file names, the $(...) must first be expanded.

David W.
  • 105,218
  • 39
  • 216
  • 337
  • +1 for the explanation and the correct and portable `sort` command. Quibbles: `printf '%s\n' file_*` is preferable to `ls -1 file_*`. Not an issue in _this_ case, but worth noting in general: this solution will break with filenames that have embedded spaces. – mklement0 Jul 24 '14 at 18:38
1

This works for me...

for i in $(seq 0 $N); do [[ -f file_$i.txt ]] && cat file_$i.txt; done > newfile

Or, more concisely

for i in $(seq 0 $N); do cat file_$i.txt 2> /dev/null ;done > newfile
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
0

You can use ls for listing files:

for file in `ls *.txt`
do·
  cat ${file} >>  output
done

Some sort techniques are discussed here: Unix's 'ls' sort by name

Community
  • 1
  • 1
user1830432
  • 680
  • 1
  • 5
  • 9
  • it does work for me too but what I couldn't do is place them in order of their names. my output txt file starts from file 10 till 19 and then 1,20,2-9 but I wanted to have them from 1 to 20 in numerical order. Thank you – user3612121 Jul 24 '14 at 13:11
  • You will have to rename your files from name_1.txt to name_01.txt. – user1830432 Jul 24 '14 at 13:26
  • create files something like this: touch file{01..20}.txt – user1830432 Jul 24 '14 at 13:30
  • Aside from not addressing the sorting issue: Using globbing (pathname expansion) _directly_ is simpler, more robust, and also faster: `for file in *.txt` - parsing `ls` output is not a good idea; see http://mywiki.wooledge.org/ParsingLs. As written (without sorting), your command could be simplified to: `cat *.txt > output`. – mklement0 Jul 24 '14 at 18:10
  • But this question was about sorting issue. IMO problem was only in naming of files and using 1 instead of 01. If files are named properly simple cat*.txt > out will suffice. – user1830432 Jul 25 '14 at 09:20
  • (Because you're the author of this answer, I needn't address you explicitly for you to see my comment, but the inverse is not true: if you want a commenter to see your reply to a comment of theirs, you must include `@`.) If your argument is that the source files should be renamed first, then you should (a) provide a renaming solution (don't just provide a link), and (b) reduce your existing code to `cat *.txt > output` to avoid the inefficient and brittle `for file in \`ls ...` approach. – mklement0 Jul 28 '14 at 22:56
0

Both solutions work well for the specific case at hand, but not generally in that they'll break with filenames with embedded spaces or other metacharacters (characters that, when used unquoted, have special meaning to the shell).

Here are solutions that work with filenames with embedded spaces, etc.:


Preferable solution for systems where sort -z and xargs -0 are supported (e.g., Linux, OSX, *BSD):

printf "%s\0" file_*.txt | sort -z -t_ -k2,2n  | xargs -0 cat > out.txt

Uses NUL (null character, 0x0) to separate the filenames and so safely preserves their boundaries.

This is the most robust solution, because it even handles filename with embedded newlines correctly (although such filenames are very rare in practice). Unfortunately, sort -z and xargs -0 are not POSIX-compliant.


POSIX-compliant solution, using xargs -I:

printf "%s\n" file_*.txt | sort -t_ -k2,2n  | xargs -I % cat % > out.txt

Processing is line-based, and due to use of -I, cat is invoked once per input filename, making this method slower than the one above.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775