0

I'm executing a simple SELECT COUNT query on a large Postgres database through dplyr (the table I'm querying has ~60m rows). I can't provide a MWE, but here's the idea of what I'm doing:

# open DB connection
drv <- DBI::dbDriver('PostgreSQL')
con <- RPostgreSQL::dbConnect(...)

# count rows satisfying x = something in table tab
dplyr::tbl(con, "tab") %>%
  dplyr::filter(x == something) %>%
  dplyr::summarise(n = n()) %>%
  print()

# disconnect DB
RPostgreSQL::dbDisconnect(con) 

The query works fine. The cache memory usage goes up substantially during query execution, which I guess is normal. My issue is that the usage of the OS's cache memory doesn't decrease after the query is done and the database is disconnected -- like there is no garbage collection or something. That's an issue because when run multiple queries in a row, they start failing for what I think is a lack of memory. How can I make sure that the cache gets cleared after the query?

I'm using the following configuration:

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1     

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RPostgreSQL_0.6-2 DBI_1.1.0         tmap_2.3-2        sf_0.9-6          forcats_0.5.1    
 [6] stringr_1.4.0     dplyr_1.0.7       purrr_0.3.4       readr_1.4.0       tidyr_1.1.3      
[11] tibble_3.1.3      ggplot2_3.3.3     tidyverse_1.3.1  
Ben
  • 429
  • 4
  • 11
  • Take a look on https://stackoverflow.com/a/1223421/1168212 . But I suspect that the cause for "when run multiple queries in a row, they start failing" is something other – Alex Yu Jul 31 '21 at 10:16
  • `... they start failing for what I think is a lack of memory.` I think you are out of connections. Question: how do you measure `the OS cache` ? How do you know that `R` is not using too much memory? – wildplasser Jul 31 '21 at 11:29
  • @wildplasser I use the [system-monitor](https://extensions.gnome.org/extension/120/system-monitor/) extension of Gnome that shows cache memory usage, among other things. – Ben Aug 02 '21 at 16:11
  • I had this problem before. Is `DBI::dbClearResult()` what you want? –  Aug 02 '21 at 16:11
  • @Adam that might be a solution. But `DBI::dbClearResult()` takes a query result as argument right? How can I use it when I query the DB through dplyr as in my example? – Ben Aug 02 '21 at 16:13
  • @Ben that question is why I commented rather than answered. But I have used it before in a slightly different context with success, so I hope maybe it starts you in the right direction. –  Aug 02 '21 at 16:14

0 Answers0