I have recently started exploring R.
I have a csv which has many rows, using which my goal is to calculate average of one column by grouping 3 other columns.
For small data, the code works pretty well, but when it comes to million rows, it takes time (around 10 sec).
Here is what I have tried.
Used fread to read only 4 columns from the large csv [Takes around 5 sec]
mydata <- fread("csv_data.csv", select = c("Col1", "Col2","Col3","Time"))
Used aggregate
aggregate(mydata$Time,by=list(mydata$Col1,mydata$Col2,mydata$Col3),mean,rm.na=TRUE)
The aggregate function takes around 6 sec for calculation and result generation.
I also tried sqldf
, but takes more than 30 sec.
My goal is to find the average of time for a unique combination of Col1, Col2 and Col3
So my question is; how to calculate the average time for a unique combination of other three columns efficiently?