0

I have 30 files (imported to R as dataframe) and each of them looks like this:

     X..Chr  SNP1  SNP2 Dist Sign  r2
1       1     1     2  1208    - 0.099
2       1     1     3  3440    + 0.414
3       1     1     4 11078    + 0.125
4       1     1     5 13934    + 0.201
5       1     1     6 20957    - 0.008
6       1     1     7 21046    - 0.000

I read each of them in this way:

chr1 <- read.table("/home/Paulal/ld_syn_c01.txt", stringsAsFactors = FALSE, header = TRUE, nrows = 5)
classes <- sapply(chr1rows, class)
chr1 <- read.table("/home/Paulal/output_ld/ld_syn_c01.txt", header = TRUE, colClasses = classes)

I am using R on Linux:

R version 3.0.1 (2013-05-16) -- "Good Sport"
Platform: x86_64-redhat-linux-gnu (64-bit)

When I sum up the dimensions of all of them, it would give me a file with 4.5 x e11. However R is not "rbind"ing them. I get the error:

Error in anyDuplicated.default(rlabs) :
  long vectors not supported yet: unique.c:550
In addition: Warning messages:
1: In Make.row.names(nmi, ri, ni, nrow) : NAs introduced by coercion
2: In nrow + ni : NAs produced by integer overflow.

Any suggestions in how I can create this file? Thank you very much. Cheers, Paula.

Jaap
  • 81,064
  • 34
  • 182
  • 193
PaulaF
  • 393
  • 3
  • 17
  • Your data looks like a `data.frame` to me, not 30 files. – Boxuan Sep 11 '15 at 02:36
  • @RichardScriven I have edited my question. – PaulaF Sep 11 '15 at 02:42
  • @Boxuan I have 30 dataframes and each of them looks like that one. – PaulaF Sep 11 '15 at 02:43
  • So you are attempting to make one very long data.frame? What are the dimensions of each data.frame? And are you trying to combine them in R for some operations, or are is your intent to write them to a file, as your question body seems to indicate? – blep Sep 11 '15 at 02:46
  • do you have 3000 gb available? – Rorschach Sep 11 '15 at 02:48
  • @dd3 You are right. I need to create a very long dataframe and each of them has a dimeniosn of: dim(chr1) [1] 1673012945 6. I do not need to write them to a file. I just want to do some calculations using the very long dataframe. – PaulaF Sep 11 '15 at 02:56
  • @nongkrong I have 3.76T. Should be enough? – PaulaF Sep 11 '15 at 03:00
  • 1
    If you look at this SO post: http://stackoverflow.com/questions/5233769/practical-limits-of-r-data-frame. R supports up to `2^31-1` rows, which is less than `4.5e+11`. You should consider something like `SQL` for data with that size. – Boxuan Sep 11 '15 at 03:15
  • By the way, is it possible to do your calculations in each of the file and summarize them at the end? If so, you may be able to perform your tasks 30 times and combine your results into one file. – Boxuan Sep 11 '15 at 03:16
  • @Boxuan I think it would be possible, but would be more complicated as I would need to weight all statistics that I need to calculate. If I don't find a way to rbind them, I will have to do that. But I would prefer to create a single dataframe instead. – PaulaF Sep 11 '15 at 03:22

0 Answers0