2

WRDS is a leading provider of research data to academics and other researchers in business and related fields. WRDS provides a PostgreSQL database, but this is a read-only database.

For some tasks, the inability to write data to the database is very constraining. For example, if I want to run an event study using daily stock returns, I will need to merge my (relatively small) local data set events with crsp.dsf, which is about 18GB of data.

One option is to maintain my own database with a copy of crsp.dsf and write events to that database and merge there. But I am looking for an option that allows me to use the WRDS database for this purpose. Unfortunately, there is no way to use copy_to or dbWriteTable, as the WRDS database is read-only.

Ian Gow
  • 3,098
  • 1
  • 25
  • 31
  • Any updates about this? I posted a similar question here: https://stackoverflow.com/questions/74187711/join-a-data-frame-or-other-r-object-to-a-table-in-a-read-only-postgresql-datab – Kelly Thompson Oct 25 '22 at 13:19
  • A few months late, but see updated answer below. – Ian Gow Feb 03 '23 at 15:43

2 Answers2

2

A more up-to-date answer than the one below is to use the copy_inline function from the dbplyr package, which was added following an issue filed on this topic.


One option is to use something like the following function, which turns a local data frame into a remote data frame using SQL even when using a read-only connection.

df_to_pg <- function(df, conn) {

    collapse <- function(x) paste0("(", paste(x, collapse = ", "), ")")

    names <- paste(DBI::dbQuoteIdentifier(conn, names(df)), collapse = ", ")

    values <-
        df %>%
        lapply(DBI::dbQuoteLiteral, conn = conn) %>%
        purrr::transpose() %>%
        lapply(collapse) %>%
        paste(collapse = ",\n")

    the_sql <- paste("SELECT * FROM (VALUES", values, ") AS t (", names, ")")

    temp_df_sql <- dplyr::tbl(conn, dplyr::sql(the_sql))
    
    return(temp_df_sql)
}

Here is an illustration of the function in use. Function has been tested on PostgreSQL and SQL Server, but will not work on SQLite (due to lack of VALUES keyword that works in this way). I believe it should work on MySQL or Oracle, as these have the VALUES keyword.

library(dplyr, warn.conflicts = FALSE)
library(DBI)
   
pg <- dbConnect(RPostgres::Postgres())     

events <- tibble(firm_ids = 10000:10024L,
                 date = seq(from = as.Date("2020-03-14"), 
                            length = length(firm_ids), 
                            by = 1))
events
#> # A tibble: 25 x 2
#>    firm_ids date      
#>       <int> <date>    
#>  1    10000 2020-03-14
#>  2    10001 2020-03-15
#>  3    10002 2020-03-16
#>  4    10003 2020-03-17
#>  5    10004 2020-03-18
#>  6    10005 2020-03-19
#>  7    10006 2020-03-20
#>  8    10007 2020-03-21
#>  9    10008 2020-03-22
#> 10    10009 2020-03-23
#> # … with 15 more rows

events_pg <- df_to_pg(events, pg)
events_pg
#> # Source:   SQL [?? x 2]
#> # Database: postgres [iangow@/tmp:5432/crsp]
#>    firm_ids date      
#>       <int> <date>    
#>  1    10000 2020-03-14
#>  2    10001 2020-03-15
#>  3    10002 2020-03-16
#>  4    10003 2020-03-17
#>  5    10004 2020-03-18
#>  6    10005 2020-03-19
#>  7    10006 2020-03-20
#>  8    10007 2020-03-21
#>  9    10008 2020-03-22
#> 10    10009 2020-03-23
#> # … with more rows

Created on 2021-04-01 by the reprex package (v1.0.0)

Ian Gow
  • 3,098
  • 1
  • 25
  • 31
  • This is great! However I am not able to reproduce your example in Oracle. Would you know how to rewrite the function to make it function in Oracle as well? Here is the error message I am getting (shortened to fit the comment): Error: nanodbc/nanodbc.cpp:1617: 42S02: [Oracle][ODBC][Ora]ORA-00903: invalid table name 'SELECT * FROM (SELECT * FROM (VALUES (10000, '2020-03-14 UTC'), (10001, '2020-03-15 UTC'), ... (10024, '2020-04-07 UTC') ) AS t ( "firm_ids", "date" )) "q01" WHERE (0 = 1)' – CAJ Jan 28 '22 at 10:08
  • @CAJ I don't know. My guess is that it might make sense to tweak the function to return SQL and then (using a small table) tweak the SQL to fit the demands of Oracle's syntax. It looks like the "AS t (" part might be parsed as a table name by Oracle. – Ian Gow Jan 29 '22 at 23:44
0

Posting for future reference: This has been added to dplyr https://dbplyr.tidyverse.org/reference/copy_inline.html

Fedor
  • 17,146
  • 13
  • 40
  • 131
TM091994
  • 5
  • 3