0

I am using an aggregate function to aggregate 15 columns over one group variable. The dataset has around 75 Million records. This aggregate function fails stating memory issue.

What is the most effective way to summarise multiple columns over a group in a large dataset?

Line I have used for aggregation:

Features<-aggregate(data=model_data, .~srl_nbr, function(x) sum(x))
MJ17
  • 109
  • 1
  • 6
  • 1
    Try with `dplyr` `library(dplyr); model_data %>% group_by(srl_nbr) %>% summarie_all(sum)` or use `data.table`; `library(data.table);setDT(model_data)[, lapply(.SD, sum), srl_nbr]` – akrun May 23 '18 at 07:04
  • 1
    Some functions of base `R` are not optimized in memory usage. Try e.g. the `data.table`-package to solve your problem. – jogo May 23 '18 at 07:10
  • I had tried both dplyr and data table package and both worked smoothly. Guess aggregate function isn't optimised for large datasets! Thank you – MJ17 May 23 '18 at 07:19

0 Answers0