Aggregate takes a long time

Question

I am using aggregate function to do the aggregation by

aggregate(x=df$time,by=list(df$id),FUN=sum)

My table is having 100 million records and it takes hours to take the results.How can I reduce the time of this process.Any help is appreciated?

So did you go thru the options here http://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group ? — David Arenburg, Jan 26 '17 at 09:22

score 2 · Accepted Answer · answered Jan 26 '17 at 09:25

2

Have you loading your initial table with the data.table library? This will save a significant amount of time just loading 100m rows.

DT <- fread("path/to/file.csv")

Then you can aggregate fairly quickly with:

DT[ , AggColumn := sum(time), by = id]

answered Jan 26 '17 at 09:25

Oliver Frost

Or you can just do `dt <- as.data.table(df)`. `fread` is great but its automatic classification for columns isn't always pretty. I routinely work with data.tables of 60M+ rows; aggregations usually take _seconds_. Sometimes, worst case, minutes, if the aggregation logic is complex. – BCC Jan 26 '17 at 19:20

1 Answers1