I am using aggregate function to do the aggregation by
aggregate(x=df$time,by=list(df$id),FUN=sum)
My table is having 100 million records and it takes hours to take the results.How can I reduce the time of this process.Any help is appreciated?
I am using aggregate function to do the aggregation by
aggregate(x=df$time,by=list(df$id),FUN=sum)
My table is having 100 million records and it takes hours to take the results.How can I reduce the time of this process.Any help is appreciated?
Have you loading your initial table with the data.table
library? This will save a significant amount of time just loading 100m rows.
DT <- fread("path/to/file.csv")
Then you can aggregate fairly quickly with:
DT[ , AggColumn := sum(time), by = id]