I have worked little bit with DBI in R and first question is more of best practice, as currently appending new data to DB is taking more time than I hoped. Second is error that I'm receiving when trying to update old information in database. Here is my current workflow when inserting new data to existing table in DB:
con <- dbConnect(odbc(), "myDSN")
# Example table 1
tbl1 <- tibble(Key = c("A", "B", "C", "D", "E"),
Val = c(1, 2, 3, 4, 5))
# Original table in DB
dbWriteTable(con, "tbl1", tbl1, overwrite = TRUE)
# Link to Original table
db_tbl <- tbl(con, in_schema("dbo", "tbl1"))
# New data
tbl2 <- tibble(Key = c("D", "E", "F", "G", "H"),
val = c(10, 11, 12, 13, 14))
# Write it to Staging
dbWriteTable(con, "tbl1_staging", tbl2, overwrite = TRUE)
# Get a link to staging
db_tblStaging <- tbl(con, in_schema("dbo", "tbl1_staging"))
# Compare Info
not_in_db <- db_tblStaging %>%
anti_join(db_tbl, by="Key") %>%
collect()
# Append missing info to DB
dbWriteTable(con, "tbl1", not_in_db, append = TRUE)
# Voila!
dbReadTable(con, "tbl1")
That will do the trick, but I'm looking for better solution, as I hate the collect()
part of the code, which means that I'm bringing something to in R memory (as far as I understand it) could be a problem in future, when I have bigger data. What I hoped would work is something like this, that would allow me to append new data to DB in a fly, without it visiting in memory.
# What I hoped to have
db_tblStaging %>%
anti_join(db_tbl, by="Key") %>%
dbWriteTable(con, "tbl1", ., append = TRUE)
Second problem is updating existing table. Here is what I tried, but error will emerge and can't figure it out. Here is link where I tried to copy the answer: How to pass data.frame for UPDATE with R DBI. I would like to update key E and D with new values in val.
# Trying to update tbl1
update_values <- db_tblStaging %>%
semi_join(db_tbl, by="Key") %>%
collect()
update <- dbSendQuery(con, 'UPDATE tbl1
SET "val" = ?
WHERE Key = ?')
dbBind(update, update_values)
Error in result_bind(res@ptr, as.list(params)) :
nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Incorrect syntax near the keyword 'Key'.
Has the package changed in some way? I can't spot my syntax error.