I am using R
to cbind about ~11000 files using:
dat <- do.call('bind_cols',lapply(lfiles,read.delim))
which is unbelievably slow. I am using R because my downstream processing like creating plots etc is in R. What are some fast alternatives to concatenating thousands of files by columns?
I have three types of files for which I want this done. They look like this:
[centos@ip data]$ head C021_0011_001786_tumor_RNASeq.abundance.tsv
target_id length eff_length est_counts tpm
ENST00000619216.1 68 26.6432 10.9074 5.69241
ENST00000473358.1 712 525.473 0 0
ENST00000469289.1 535 348.721 0 0
ENST00000607096.1 138 15.8599 0 0
ENST00000417324.1 1187 1000.44 0.0673096 0.000935515
ENST00000461467.1 590 403.565 3.22654 0.11117
ENST00000335137.3 918 731.448 0 0
ENST00000466430.5 2748 2561.44 162.535 0.882322
ENST00000495576.1 1319 1132.44 0 0
[centos@ip data]$ head C021_0011_001786_tumor_RNASeq.rsem.genes.norm_counts.hugo.tab
gene_id C021_0011_001786_tumor_RNASeq
TSPAN6 1979.7185
TNMD 1.321
DPM1 1878.8831
SCYL3 452.0372
C1orf112 203.6125
FGR 494.049
CFH 509.8964
FUCA2 1821.6096
GCLC 1557.4431
[centos@ip data]$ head CPBT_0009_1_tumor_RNASeq.rsem.genes.norm_counts.tab
gene_id CPBT_0009_1_tumor_RNASeq
ENSG00000000003.14 2005.0934
ENSG00000000005.5 5.0934
ENSG00000000419.12 1100.1698
ENSG00000000457.13 2376.9100
ENSG00000000460.16 1536.5025
ENSG00000000938.12 443.1239
ENSG00000000971.15 1186.5365
ENSG00000001036.13 1091.6808
ENSG00000001084.10 1602.7165
Thanks!