As mentioned by Hong Ooi, PL/R loads an R interpreter into the PostgreSQL backend process. So your R code is running "in database".
There is no universal way to deal with memory limitations, but there are least two possible options:
- define a custom PostgreSQL aggregate, and use your PL/R function as
the "final" function. In this way you are processing in groups, and
thus less likely to have problems with memory. See the online PostgreSQL documentation and PL/R documentation for more detail (I don't post to stackoverflow often, so unfortunately it will not allow me to post the actual URLs for you)
- Use the pg.spi.cursor_open and pg.spi.cursor_fetch functions installed by
PL/R into the R interpreter in order to page data into your R function in chunks.
See PL/R docs here: http://www.joeconway.com/plr/doc/index.html
I am guessing what you would really like to have is a data.frame in which the data is paged to and from an underlying database cursor transparently to your R code. This is on my long term TODO, but unfortunately I have not been able to find the time to work it out. I have been told that Oracle's R connector has this feature, so it seems it can be done. Patches welcomed ;-)