1

I have a large list of objects (say 100k elements). Each element will have to be processed by a function "process" BUT I would like to do the processing in chunks... say 20 passes for example as I want to save processing results into a hard drive file and keep operating memory free.

I'm new to R and I know that it should involve some apply magic but I don't know how to do it (yet).

Any guidance would be much appreciated.

A small example:

 objects <- list();
 for (i in 1:100){
 objects <- append(objects, 500);
 }
 objects;





 processOneElement <- function(x){
 x/20 + 23;
 }

I would like to process first 20 elements in one go and save results then process second 20 elements in second go and save results... and so on

objects <- list();
 for (i in 1:100){
 objects <- append(objects, 500);
 }
 objects;

process <- function(x){
 x/20 + 23;
 }

results <- lapply(objects, FUN=process)



index <- seq(1, length(objects), by=20);
lapply(index, function(idx1) {
idx2 <- min(idx1+20-1, length(objects));
batch <- lapply(idx:idx2, function(x) {
process(objects[[x]]);
})

write.table(batch, paste("batch", idx1, sep=""));
})
user13467
  • 35
  • 5

1 Answers1

2

With what you have given, this is the answer I could suggest. Assuming your list is stored in list.object,

lapply(seq(1, length(list.object), by=20), function(idx) {
    # here idx will be 1, 21, 41 etc...
    idx2 <- min(idx+20-1, length(list.object))
    # do what you want here.. 
    batch.20.processed <- lapply(idx:idx2, function(x) {
        process(list.object[[x]]) # passes idx:idx2 indices one at a time
    })
    # here you have processed list with 20 elements
    # finally write to file
    lapply(1:20, function(x) {
        write.table(batch.20.processed[[x]], ...)
        # where "..." is all other allowed arguments to write.table
        # such as row.names, col.names, quote etc.
        # don't literally pass "..." to write.table
    })
}
Arun
  • 116,683
  • 26
  • 284
  • 387
  • Hi, I actually have all my data loaded. I will be saving results from processing batches into files. I also added some code in original question which is reproducable – user13467 Jan 23 '13 at 15:21
  • processing of each element results in an object which is relatively speaking large so I can't afford to process all of them in one go... I need to process first 20 save results and repeat this process until I'm done – user13467 Jan 23 '13 at 15:26
  • I tried lapply(obejcts, FUN=process) but the output is too big – user13467 Jan 23 '13 at 15:28
  • 1
    I used your code as a guidance and uploaded working code in example. Thanks. – user13467 Jan 23 '13 at 16:01
  • This looks really useful, but I get an error `Error in FUN(1:20[[1L]], ...) : '...' used in an incorrect context` with a list of 500 items. Can you double-check the function? I'm not familiar with proper usage of `...` – Ben May 22 '13 at 05:59
  • I was taking the dots literally, my mistake! Thanks very much for clarifying that, I've got it working now. – Ben May 22 '13 at 08:08