4

I have worked little bit with DBI in R and first question is more of best practice, as currently appending new data to DB is taking more time than I hoped. Second is error that I'm receiving when trying to update old information in database. Here is my current workflow when inserting new data to existing table in DB:

con <- dbConnect(odbc(), "myDSN")

# Example table 1
tbl1 <- tibble(Key = c("A", "B", "C", "D", "E"),
               Val = c(1, 2, 3, 4, 5))

# Original table in DB
dbWriteTable(con, "tbl1", tbl1, overwrite = TRUE)

# Link to Original table
db_tbl <- tbl(con, in_schema("dbo", "tbl1"))

# New data
tbl2 <- tibble(Key = c("D", "E", "F", "G", "H"),
               val = c(10, 11, 12, 13, 14))

# Write it to Staging
dbWriteTable(con, "tbl1_staging", tbl2, overwrite = TRUE)

# Get a link to staging
db_tblStaging <- tbl(con, in_schema("dbo", "tbl1_staging"))

# Compare Info
not_in_db <- db_tblStaging %>% 
  anti_join(db_tbl, by="Key") %>% 
  collect()

# Append missing info to DB
dbWriteTable(con, "tbl1", not_in_db, append = TRUE)

# Voila!
dbReadTable(con, "tbl1")

That will do the trick, but I'm looking for better solution, as I hate the collect() part of the code, which means that I'm bringing something to in R memory (as far as I understand it) could be a problem in future, when I have bigger data. What I hoped would work is something like this, that would allow me to append new data to DB in a fly, without it visiting in memory.

# What I hoped to have
db_tblStaging %>% 
  anti_join(db_tbl, by="Key") %>% 
  dbWriteTable(con, "tbl1", ., append = TRUE)

Second problem is updating existing table. Here is what I tried, but error will emerge and can't figure it out. Here is link where I tried to copy the answer: How to pass data.frame for UPDATE with R DBI. I would like to update key E and D with new values in val.

# Trying to update tbl1
update_values <- db_tblStaging %>% 
  semi_join(db_tbl, by="Key") %>% 
  collect()

update <- dbSendQuery(con, 'UPDATE tbl1 
                            SET "val" = ? 
                            WHERE Key = ?')

dbBind(update, update_values)

Error in result_bind(res@ptr, as.list(params)) : 
  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Incorrect syntax near the keyword 'Key'. 

Has the package changed in some way? I can't spot my syntax error.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Hakki
  • 1,440
  • 12
  • 26

1 Answers1

2

Consider running pure SQL after your table staging uploads as it looks like you need the NOT EXISTS (to avoid duplicates) and UPDATE INNER JOIN (for existing records). This avoids any R client side query imports and exports.

And Key is a reserved word in SQL Server. Hence, escape it with square brackets.

apn_sql <- "INSERT INTO dbo.tbl (s.[Key], s.[Val])
            SELECT s.[Key], s.[Val] FROM dbo.tbl_staging s
            WHERE NOT EXISTS
              (SELECT 1 FROM dbo.tbl t
               WHERE t.[Key] = s.[Key])"

dbSendQuery(con, apn_sql)

upd_sql <- "UPDATE t
            SET t.Val = s.Val
            FROM dbo.tbl t 
            INNER JOIN dbo.tbl_staging s 
               ON t.[Key] = s.[Key]"

dbSendQuery(con, upd_sql)

Rextester demo

In fact, SQL Server has the MERGE query to handle both in one call:

MERGE dbo.tbl AS Target
USING (SELECT [Key], [Val] FROM dbo.tbl_staging) AS Source
  ON (Target.[Key] = Source.[Key])
WHEN MATCHED THEN
    UPDATE SET Target.Val = Source.Val
WHEN NOT MATCHED BY TARGET THEN
    INSERT ([Key], [Val])
    VALUES (Source.[Key], Source.[Val]);

Rextester demo

Parfait
  • 104,375
  • 17
  • 94
  • 125