1

I have a super large data frame to be queried in the PostgreSQL instance, and the following QuerySQLDB takes a very long time, and program is just waiting and idling until the query finishes, and then do the rbind. I am wondering, since the second looping, can I allow the rbind and the query run at the same time, so that I can save the overall computing time?

for(n in 1:200){
dx<-InputDataFrame[((n-1)*1000):n*1000,]
outdx<-QuerySQLDB(dx)
InputDataFrame <-rbind(OutputDataFrame,outdx)}
zhouhufeng
  • 76
  • 3
  • Recommendations: 1. Write intermediary results to disk within the loop, 2. Do rbind only once outside the loop, e.g. with `do.call(rbind, list_of_outdx)` or with `rbindlist` from `data.table` – Aurèle Nov 15 '21 at 16:02
  • Thanks very much. But writing out huge data frame will also involve huge IO cost. – zhouhufeng Nov 15 '21 at 16:36
  • You are probably having issues by trying to grow a dataframe using rbind. Maybe this question and answer would provide some hints [How can I prevent rbind() from geting really slow as dataframe grows larger?](https://stackoverflow.com/a/14694108/5491184) – Juan Bosco Nov 18 '21 at 17:55

0 Answers0