0

I have 500 files and I want to merge them by adding columns. My first file

3
4
1
5

My second file

7
1
4
2

Output should look like

3 7
4 1 
1 4
5 2

But I have 500 files (sum_1.txt, sum_501.txt until sum_249501.txt), so I must have 500 column, so It will be very frustrating to write 500 file names. Is it possible to do this easier? I try this, but it not makes 500 columns, but instead it makes a lot of rows

#!/bin/bash
file_name="sum"
tmp=$(mktemp) || exit 1 
touch ${file_name}_calosc.txt
for first in {1..249501..500}
do
paste -d ${file_name}_calosc.txt ${file_name}_$first.txt >> ${file_name}_calosc.txt
done
Jakub
  • 679
  • 5
  • 16
  • 2
    It's not clear what the result should be. Are you looking for `paste sum_1.txt sum_501.txt sum_1001.txt ... sum_249501.txt >combined.txt`? – tripleee Apr 26 '21 at 17:23
  • Yes. Each file has one colum. I want to put all this columns into one file – Jakub Apr 26 '21 at 17:25
  • 1
    The solution to that would be `paste $(for ((i=1; i<=24501; i+=500)); do echo sum_$i.txt; done)` but the command line might be too long for your kernel. – tripleee Apr 26 '21 at 17:25
  • 1
    FWIW, one-liners like this one or `paste {1..249501..500}_calosc.txt > sum_calosc.txt` fail with "too many open file handles" for me on macOS. – Benjamin W. Apr 26 '21 at 17:45
  • 1
    @BenjaminW Yeah, that too; good catch. https://stackoverflow.com/questions/5377450/maximum-number-of-open-filehandles-per-process-on-osx-and-how-to-increase has some partial remedies for macOS; I'm sure something similar could be found for other platforms. – tripleee Apr 26 '21 at 18:10
  • Both the shell and Awk have limits on how many open file handles they can manage (in the case of Awk it's something tiny like 25 out of the box). Perhaps the simplest fix then is to batch the files and successively collect files with more and more columns in them (say, 25 per file, then paste 20 of _those_ files together). – tripleee Apr 26 '21 at 18:38
  • 2
    Your script would have to use something like `paste -d ${file_name}_calosc.txt ${file_name}_$first.txt >> ${file_name}_calosc.tmp; mv ${file_name}_calosc.tmp ${file_name}_calosc.txt` (using a temporary file name for the output in the loop) to avoid disaster. If you run into problems with the number of open files, you could batch sets of files together, maybe using `xargs` to group sets of 50 files, then concatenate the 10 intermediate results files. But if you can get `paste` to do the job in one pass, you should do so. – Jonathan Leffler Apr 26 '21 at 18:58

2 Answers2

3

Something like this (untested) should work regardless of how many files you have:

awk '
    BEGIN {
        for (i=1; i<=249501; i+=500) {
            ARGV[ARGC++] = "sum_" i
        }
    }
    { vals[FNR] = (NR==FNR ? "" : vals[FNR] OFS) $0 }
    END {
        for (i=1; i<=FNR; i++) {
            print vals[i]
        }
    }
'

It'd only fail if the total content of all the files was too big to fit in memory.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

Your command says to paste two files together; to paste more files, give more files as arguments to paste.

You can paste a number of files together like

paste sum_{1..249501..500}_calosc.txt > sum_calosc.txt

but if the number of files is too large for paste, or the resulting command line is too long, you may still have to resort to temporary files.

Here's an attempt to paste 25 files at a time, then combine the resulting 20 files in a final big paste.

#!/bin/bash

d=$(mktemp -d -t pastemanyXXXXXXXXXXX) || exit

# Clean up when done
trap 'rm -rf "$d"; exit' ERR EXIT

for ((i=1; i<= 249501; i+=500*25)); do
    printf -v dest "paste%06i.txt" "$i"
    for ((j=1, k=i; j<=500; j++, k++)); do
        printf "sum_%i.txt\n" "$k"
    done |
    xargs paste >"$d/$dest"
done

paste "$d"/* >sum_calosc.txt

The function of xargs is to combine its inputs into a single command line (or more than one if it would otherwise be too long; but we are specifically trying to avoid that here, because we want to control exactly how many files we pass to paste).

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 1
    Use `xargs -n 25` to process up to 25 arguments at a time? – Jonathan Leffler Apr 26 '21 at 19:00
  • Yeah, I was looking at ways to do that, and this can probably be optimized and refactored to be more elegant; the problem with `xargs -n 25` is that I still want to control the output file name in each instance separately, so I ended up with a loop. – tripleee Apr 26 '21 at 19:01
  • 1
    That's a problem — where does the output go. Fair enough; a reason not to use `xargs` to group the arguments with `-n 25`. – Jonathan Leffler Apr 26 '21 at 19:04
  • I'll perhaps also note that the `{1..20..5}` syntax of brace expansion is Bash 4+ only; so, not available out of the box e.g. on macOS. – tripleee Apr 27 '21 at 04:50
  • I fixed a bug just now; the earlier version would cause a few of the columns in the final result to be in the wrong order (because the script relies on the shell's sort order for the final wildcard expansion, and files with six digits in the file name would appear in the wrong place in spite of my attempt to add zero padding, only I did not add enough zeros). – tripleee Apr 27 '21 at 04:56