When running PL/R on PostgreSQL, can R handle data bigger then RAM?

Question

When I use R open source, if not using a specific package, it's not possible handle data sets bigger than RAM memory. So I would like to know if it's possible handle big data sets applying PL/R functions inside PostgreSQL.

I didn't found any documentation about this.

Also, consider the `ff` package, which allows you to store large data on disk. — nograpes, May 17 '13 at 15:17
There is some way to REALLY run R inside database? (non commercial like R on Oracle) — Flavio Barros, May 17 '13 at 15:20
It is running REALLY inside PostgreSQL (R is symbolically linked to Postgres) but that does not remove the R RAM constraints. — , May 17 '13 at 15:28
What you mean for "symbolically linked"?? Beacause, if a funtion can be translated, in some way, to SQL would't be any constraints right? — Flavio Barros, May 17 '13 at 16:44
BUT, if in the process, the data is passed to a R object, would have memory constraints, as the R engine will run the function. I know that in the Oracle implementation there isn't memory constraints, as the R interpreter act "really inside" the database. — Flavio Barros, May 17 '13 at 16:49

score 11 · Accepted Answer · answered May 18 '13 at 00:16

As mentioned by Hong Ooi, PL/R loads an R interpreter into the PostgreSQL backend process. So your R code is running "in database".

There is no universal way to deal with memory limitations, but there are least two possible options:

define a custom PostgreSQL aggregate, and use your PL/R function as the "final" function. In this way you are processing in groups, and thus less likely to have problems with memory. See the online PostgreSQL documentation and PL/R documentation for more detail (I don't post to stackoverflow often, so unfortunately it will not allow me to post the actual URLs for you)
Use the pg.spi.cursor_open and pg.spi.cursor_fetch functions installed by PL/R into the R interpreter in order to page data into your R function in chunks.

See PL/R docs here: http://www.joeconway.com/plr/doc/index.html

I am guessing what you would really like to have is a data.frame in which the data is paged to and from an underlying database cursor transparently to your R code. This is on my long term TODO, but unfortunately I have not been able to find the time to work it out. I have been told that Oracle's R connector has this feature, so it seems it can be done. Patches welcomed ;-)

Very thanks for the answer! I use a lot PostgreSQL and R, and when i knew about PL/R i became excited about the possibility of resolve R memory constraints and at the same time have the power of SQL. — Flavio Barros, May 18 '13 at 20:23

score 1 · Answer 2 · answered May 17 '13 at 17:22

1

No. PL/R just starts up a separate R process to run your R code. This uses exactly the same binaries and executables as what you'd use from the command line, so all the standard limitations still apply.

answered May 17 '13 at 17:22

Hong Ooi

56,353
13
134
187

OK, but there is some way to run a real "in database analytics with" R? – Flavio Barros May 17 '13 at 17:39

When running PL/R on PostgreSQL, can R handle data bigger then RAM?

2 Answers2