I'm executing a simple SELECT COUNT query on a large Postgres database through dplyr (the table I'm querying has ~60m rows). I can't provide a MWE, but here's the idea of what I'm doing:
# open DB connection
drv <- DBI::dbDriver('PostgreSQL')
con <- RPostgreSQL::dbConnect(...)
# count rows satisfying x = something in table tab
dplyr::tbl(con, "tab") %>%
dplyr::filter(x == something) %>%
dplyr::summarise(n = n()) %>%
print()
# disconnect DB
RPostgreSQL::dbDisconnect(con)
The query works fine. The cache memory usage goes up substantially during query execution, which I guess is normal. My issue is that the usage of the OS's cache memory doesn't decrease after the query is done and the database is disconnected -- like there is no garbage collection or something. That's an issue because when run multiple queries in a row, they start failing for what I think is a lack of memory. How can I make sure that the cache gets cleared after the query?
I'm using the following configuration:
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RPostgreSQL_0.6-2 DBI_1.1.0 tmap_2.3-2 sf_0.9-6 forcats_0.5.1
[6] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
[11] tibble_3.1.3 ggplot2_3.3.3 tidyverse_1.3.1