19

(By object-relational mapping, I mean what is described here: Wikipedia: Object-relational mapping.)

Here is how I could imagine this work in R : a kind of "virtual data frame" is linked to a database, and returns the results of SQL queries when accessed. For instance, head(virtual_list) would actually return the results of (select * from mapped_table limit 5) on the mapped database.

I have found this post by John Myles White, but there seems to have been no progress in the last 3 years.

Is there a working package that implements this ?

If not,

  1. Would it be useful ?
  2. What would be the best way to implement it (S4 ?) ?
nassimhddd
  • 8,340
  • 1
  • 29
  • 44

7 Answers7

11

The very recent package dplyr is implementing this (amongst other amazing features).

Here are illustrations from the examples of function src_mysql():

# Connection basics ---------------------------------------------------------
# To connect to a database first create a src:
my_db <- src_mysql(host = "blah.com", user = "hadley",
  password = "pass")
# Then reference a tbl within that src
my_tbl <- tbl(my_db, "my_table")

# Methods -------------------------------------------------------------------
batting <- tbl(lahman_mysql(), "Batting")
dim(batting)
colnames(batting)
head(batting)
nassimhddd
  • 8,340
  • 1
  • 29
  • 44
  • dplyr doesn't support persisting objects to the database (you still have to write INSERT INTO statements). There are ways to do it, but they aren't recommended: https://stackoverflow.com/a/26784801/3910619 – zola25 Jan 14 '19 at 15:02
  • 3
    This is not an ORM approach. This simply pulls a table from the database and converts it to a data.frame/tbl. A proper ORM maps entries, including their relationships, of the database to objects/classes in the programming language. – ak17 Jun 19 '20 at 15:10
7

There is an old unsupported package, SQLiteDF, that does that. Build it from source and ignore the numerous error messages.

> # from example(sqlite.data.frame)
>
> library(SQLiteDF)
> iris.sdf <- sqlite.data.frame(iris)
> iris.sdf$Petal.Length[1:10] # $ done via SQL
 [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

Looks like John Myles White he's given up on it.

There is a bit of a workaround explained here.

Joanne Demmler
  • 1,406
  • 11
  • 31
1

I don't think it would be useful. R is not a real OOP language. The "central" data structure in R is the data frame. No need for Object-Relational Mapping here.What you want is a mapping between SQL tables and data frames and the RMySQL and RODBC provide just that :

dbGetQuery to return the results of a query in a data frame and dbWriteTable to insert data in a table or do a bulk update ( from a data frame).

MadSeb
  • 7,958
  • 21
  • 80
  • 121
  • I disagree with this viewpoint: no useful language is "purely" of any particular type. R claims to be a functional language, but it also allows for many side effects. R should have the means to create an ORM, and I'm glad that `dplyr` stepped up. – Mike Williamson Dec 28 '17 at 22:29
1

As an experienced R user, I would not use this. First off, this 'virtual frame' would be slow to use, since you constantly need to synchronize between R memory and the database. It would also require locking the database table, since otherwise you have unpredictable results due to other edits happening at the same time.

Finally, I do not think R is suited for implementing a different evaluation of promise objects. Doing myFrame$foo[ myFrame$foo > 40 ] will still fetch the full foo column, since you cannot possible implement a full translation scheme from R to SQL.

Therefore, I prefer to load a dataframe() from a query, use it, and write it back to the database if required.

parasietje
  • 1,529
  • 8
  • 36
  • Thanks for your answer. I agree with the memory and locking. That could be a problem depending on the project. Regarding the example of the line selection: a full translation scheme might not be possible, but are least a lot of common manipulations should be feasible (the line selection in the example would be trivial for instance)... – nassimhddd Aug 30 '12 at 12:18
  • The line selection example is trivial for you. Let's push this further: top 10% percentile? Random n=100 sampling? etc... In practice, the limits of this are very quickly reached. – parasietje Aug 30 '12 at 13:53
1

Next to the various driver packages for querying DBs (DBI, RODBC,RJDBC,RMySql,...) and dplyr, there's also sqldf https://cran.r-project.org/web/packages/sqldf/

This will automatically import dataframes into the db & let you query the data via sql. At the end the db is deleted.

0

The most similar could be dbplyr. In R you work with tables, not rows.

Daniel Fischer
  • 947
  • 6
  • 8