0

For example, if I’m presented with a PostgreSQL database, what am I missing out on by using RPostgreSQL to interface with it instead of executing SQL operations using the command line or a database management application? Are there some things that only be done with SQL?

Preemptively addressing a common response I’ve seen to previous iterations of this question (example): I understand that R and SQL are different tools for different purposes.

JohnDoeVsJoeSchmoe
  • 671
  • 2
  • 8
  • 25

1 Answers1

1

In comparing R packages like RPostgreSQL and RPostgres to SQL command-line or database management application, you're not limiting yourself at all. The R packages can provide the same command-line-like interface that the other tools do, such as by enabling dbExecute() through DBI and you can issue whatever statements are needed. Where differences may come in are in higher-level wrapper functions that may have trouble interfacing between R and SQL. Such issues might look like class/data type interpretation (eg, I believe R packages do not know how to resolve postgresql array data types as a specific example). This can affect upsert type commands, for example, or calls to append full data.frames onto database tables, since data types may mismatch when higher-level functions attempt to auto-map these incorrectly; similar issues may also exist in having to map NA to NULL or explicitly cast data types. There may also be issues in writing temporary tables and things of this nature. I'd consider these more as quirks or annoyances, since there are work-arounds that allow the same level of functionality. Such as by using lower-level function calls that require to write-out SQL more comprehensively; or by additional work to map data types that will translate between native SQL and R environments. (I replied to another SO question elsewhere with an example of such a work-around to use intermediate tables between R and the database: How to insert R dataframe into existing table in SQL Server). Stepping away from transaction level commands to fuller DBA needs, like managing users, permissions, underlying configurations and so forth may find some limitations as well. I'll also note that super popular though it may be, dplyr package, which provides very high level convenience functions for interacting with SQL is more likely to impose limitations than writing statements directly and using lower-level package calls. dplyr auto-builds SQL statements using the R function calls, but trying to do anything beyond relatively simple select/insert/update/join style calls is likely to require a fair amount of effort to enable the so-called shortcuts to perform as expected.

Soren
  • 1,792
  • 1
  • 13
  • 16