0

I would like to know whether it is possible to set a limit in terms of available memory (RAM) for a program written in R that creates Parquet file using Arrow package (exporting dataframes to Parquet).

In a database like DuckDB which also allows to create Parquet files, it is possible to set the available memory at database level, for example:

SET memory_limit='10GB';

Can the same be done in R at program level? So that, no matter what, one can be sure that after a certain limit of memory consumption, the program will not go further and allocate more memory from RAM.

Another interesting feature in DuckDB is the possibility to set a temporary directory, so that when the memory limit defined by set memory_limit has reached, the program uses this directory for further required memory allocated on the disk.

Is any of these two features available when using Arrow package in R programs?

user17911
  • 1,073
  • 1
  • 8
  • 18
  • 1
    Have you tried [these](https://stackoverflow.com/questions/1395229/increasing-or-decreasing-the-memory-available-to-r-processes) suggestions to set the RAM limit? – SamR Feb 25 '23 at 11:43
  • 1
    Thanks for the link. Apparently memory.usage is no more supported. The first suggested method will not do the job as I'm not the administrator of the environment. The "unix" package sees rather interesting but the problem is that our Datalab environment is based on Microsoft Windows. – user17911 Feb 25 '23 at 12:02
  • Interesting. What about [editing your .Renviron](https://stackoverflow.com/questions/73776480/memory-limit-is-no-longer-supported-work-around)? – SamR Feb 25 '23 at 16:01
  • Thanks for the suggestion. I'll try it on Monday at office. But if I understand correctly, this file contains only environment variables: VARIABLE=VALUE. What variable to define to limit the amount of RAM used by R in this file? Also the program could be transferred in the future to be run on another server(s), so I think if there is some limit to define, that should be rather at program/session level. – user17911 Feb 25 '23 at 21:14
  • Yes that's right it sets an environment variable. I see your point about not wanting to globally change the memory limit for R. I'm pretty sure this can be done for a session using a project level .Renviron. How exactly depends on your workflow. Are you going to be doing something like `R CMD BATCH` or using R interactively? If the latter which IDE are you using? Not that it should matter but some comments in the second question I linked indicate RStudio can act in unexpected ways. – SamR Feb 26 '23 at 06:55
  • Currently I work in Windows, but at the end, once placed on Linux servers, that would be a batch written in Bash which will simply call RScript command to run the program. I always try to avoid RStudio as much as possible despite its functionnalities and confort. I use ESS in Emacs and I run programs either in ESS but most often directly in the terminal. – user17911 Feb 26 '23 at 11:48

0 Answers0