I am trying to execute a simple query as given below on data set with 20 million rows with 10 columns , but it is taking very long time to compute the final output (30 minutes) . Is there any better way to achieve the purpose ?
(t<-Sys.time())
rd_1<-as.data.frame(rd_1 %>%
group_by(customer,location_name,Location_Date,Location_Hour) %>%
filter(created_time==max(created_time))%>%
ungroup())
(t<-Sys.time())
Below is the timestamps after running the script ..
[1] "2018-12-19 09:15:47 GMT"
> rd_1<-as.data.frame(rd_1 %>%
+ group_by(customer,location_name,Location_Date,Location_Hour) %>%
+ filter(created_time==max(created_time))%>%
+ ungroup())
> (t<-Sys.time())
[1] "2018-12-19 09:45:25 GMT"