5

When I use R open source, if not using a specific package, it's not possible handle data sets bigger than RAM memory. So I would like to know if it's possible handle big data sets applying PL/R functions inside PostgreSQL.

I didn't found any documentation about this.

nograpes
  • 18,623
  • 1
  • 44
  • 67
Flavio Barros
  • 996
  • 1
  • 11
  • 29
  • 2
    Also, consider the `ff` package, which allows you to store large data on disk. – nograpes May 17 '13 at 15:17
  • 2
    There is some way to REALLY run R inside database? (non commercial like R on Oracle) – Flavio Barros May 17 '13 at 15:20
  • 2
    It is running REALLY inside PostgreSQL (R is symbolically linked to Postgres) but that does not remove the R RAM constraints. –  May 17 '13 at 15:28
  • What you mean for "symbolically linked"?? Beacause, if a funtion can be translated, in some way, to SQL would't be any constraints right? – Flavio Barros May 17 '13 at 16:44
  • BUT, if in the process, the data is passed to a R object, would have memory constraints, as the R engine will run the function. I know that in the Oracle implementation there isn't memory constraints, as the R interpreter act "really inside" the database. – Flavio Barros May 17 '13 at 16:49

2 Answers2

11

As mentioned by Hong Ooi, PL/R loads an R interpreter into the PostgreSQL backend process. So your R code is running "in database".

There is no universal way to deal with memory limitations, but there are least two possible options:

  1. define a custom PostgreSQL aggregate, and use your PL/R function as the "final" function. In this way you are processing in groups, and thus less likely to have problems with memory. See the online PostgreSQL documentation and PL/R documentation for more detail (I don't post to stackoverflow often, so unfortunately it will not allow me to post the actual URLs for you)
  2. Use the pg.spi.cursor_open and pg.spi.cursor_fetch functions installed by PL/R into the R interpreter in order to page data into your R function in chunks.

See PL/R docs here: http://www.joeconway.com/plr/doc/index.html

I am guessing what you would really like to have is a data.frame in which the data is paged to and from an underlying database cursor transparently to your R code. This is on my long term TODO, but unfortunately I have not been able to find the time to work it out. I have been told that Oracle's R connector has this feature, so it seems it can be done. Patches welcomed ;-)

  • Very thanks for the answer! I use a lot PostgreSQL and R, and when i knew about PL/R i became excited about the possibility of resolve R memory constraints and at the same time have the power of SQL. – Flavio Barros May 18 '13 at 20:23
1

No. PL/R just starts up a separate R process to run your R code. This uses exactly the same binaries and executables as what you'd use from the command line, so all the standard limitations still apply.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187