2

I have a R program that combines 10 files each file is of size 296MB and I have increased the memory size to 8GB (Size of RAM)

--max-mem-size=8192M

and when I ran this program I got a error saying

In type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
  Reached total allocation of 7646Mb: see help(memory.size) 

Here is my R program

S1 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1_400.txt");
S2 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_401_800.txt");
S3 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_801_1200.txt");
S4 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1201_1600.txt");
S5 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1601_2000.txt");
S6 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2001_2400.txt");
S7 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2401_2800.txt");
S8 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2801_3200.txt");
S9 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3201_3600.txt");
S10 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3601_4000.txt");
options(max.print=154.8E10);
combine_result <- rbind(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10)
write.table(combine_result,file="C:/sim_omega3_1_4000.txt",sep=";",
             row.names=FALSE,col.names=TRUE, quote = FALSE);

Can anyone, help me with this

Thanks,

Shruti.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
Shruti
  • 741
  • 4
  • 14
  • 25
  • 1
    Where, specifically, did the error occur? – Joshua Ulrich Apr 21 '11 at 19:56
  • 1
    You know that the semicolon is not required at the end of a line, right? – Benjamin Apr 21 '11 at 20:27
  • If all you are doing is aggregating files, you may want to try directly using bash or DOS. Easy to search on google and this SO question may be helpful: http://stackoverflow.com/questions/4827453/merge-all-files-in-a-directory-into-one-using-bash – Chase Apr 21 '11 at 20:27
  • How many free RAM do you have before R is started? Maybe some other process consume everything. – Marek Apr 21 '11 at 21:29

4 Answers4

6

I suggest incorporating the suggestions in ?read.csv2:

Memory usage:

 These functions can use a surprising amount of memory when reading
 large files.  There is extensive discussion in the ‘R Data
 Import/Export’ manual, supplementing the notes here.

 Less memory will be used if ‘colClasses’ is specified as one of
 the six atomic vector classes.  This can be particularly so when
 reading a column that takes many distinct numeric values, as
 storing each distinct value as a character string can take up to
 14 times as much memory as storing it as an integer.

 Using ‘nrows’, even as a mild over-estimate, will help memory
 usage.

 Using ‘comment.char = ""’ will be appreciably faster than the
 ‘read.table’ default.

 ‘read.table’ is not the right tool for reading large matrices,
 especially those with many columns: it is designed to read _data
 frames_ which may have columns of very different classes.  Use
 ‘scan’ instead for matrices.
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
3

Memory allocation needs contiguous blocks. The size taken by the file on disk may not be a good index of how large the object is when loaded into R. Can you look at one of these S files with the function:

?object.size

Here is a function I use to see what is taking up the most space in R:

getsizes <- function() {z <- sapply(ls(envir=globalenv()), 
                                function(x) object.size(get(x)))
               (tmp <- as.matrix(rev(sort(z))[1:10]))}
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

If you remove(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10) then gc() after calculating combine_result, you might free enough memory. I also find that running it through RScript seems to allows access to more memory than through the GUI if you are on Windows.

Benjamin
  • 11,560
  • 13
  • 70
  • 119
  • Looking at OP error message I guess that he don't reach step where `combine_result` is calculated... – Marek Apr 21 '11 at 21:07
1

If this files are in standard format and you want to do this in R then why bother read/write csv. Use readLines/writeLines:

files_in <- file.path("C:/Sim_Omega3_results",c(
    "sim_omega3_1_400.txt",
    "sim_omega3_401_800.txt",
    "sim_omega3_801_1200.txt",
    "sim_omega3_1201_1600.txt",
    "sim_omega3_1601_2000.txt",
    "sim_omega3_2001_2400.txt",
    "sim_omega3_2401_2800.txt",
    "sim_omega3_2801_3200.txt",
    "sim_omega3_3201_3600.txt",
    "sim_omega3_3601_4000.txt"))


file.copy(files_in[1], out_file_name <- "C:/sim_omega3_1_4000.txt")
file_out <- file(out_file_name, "at")
for (file_in in files_in[-1]) {
    x <- readLines(file_in)
    writeLines(x[-1], file_out)
}
close(file_out)
Marek
  • 49,472
  • 15
  • 99
  • 121
  • Hi Marek, I'm getting an error while executing the program The Error is file_out <- file(out_file_name, "at") Error in file(out_file_name, "at") : cannot open the connection In addition: Warning message: In file(out_file_name, "at") : Thanks – Shruti Apr 21 '11 at 22:13
  • @Shruti Do you have write permission on `C:`? You could check `file.exists(out_file_name)`, should be `TRUE` or check if after `file.copy` `C:/sim_omega3_1_4000.txt` exists on your disk. Also check if `file.exists(files_in)` is all `TRUE`. – Marek Apr 22 '11 at 07:56
  • Thank you I checked if the file exists and Ran the program It didn't work.I have specifically created a file called out_file_name but no luck.... – Shruti Apr 22 '11 at 16:35
  • @Shruti: out_file_name is a variable, needs to be set to a specific path and filename... – Benjamin Apr 23 '11 at 12:35