-1

I am using aggregate function to do the aggregation by

aggregate(x=df$time,by=list(df$id),FUN=sum)

My table is having 100 million records and it takes hours to take the results.How can I reduce the time of this process.Any help is appreciated?

Sotos
  • 51,121
  • 6
  • 32
  • 66
RKR
  • 657
  • 3
  • 13
  • 26

1 Answers1

2

Have you loading your initial table with the data.table library? This will save a significant amount of time just loading 100m rows.

DT <- fread("path/to/file.csv")

Then you can aggregate fairly quickly with:

DT[ , AggColumn := sum(time), by = id]
Oliver Frost
  • 827
  • 5
  • 18
  • Or you can just do `dt <- as.data.table(df)`. `fread` is great but its automatic classification for columns isn't always pretty. I routinely work with data.tables of 60M+ rows; aggregations usually take _seconds_. Sometimes, worst case, minutes, if the aggregation logic is complex. – BCC Jan 26 '17 at 19:20