16

With RODBC, there were functions like sqlUpdate(channel, dat, ...) that allowed you pass dat = data.frame(...) instead of having to construct your own SQL string.

However, with R's DBI, all I see are functions like dbSendQuery(conn, statement, ...) which only take a string statement and gives no opportunity to specify a data.frame directly.

So how to UPDATE using a data.frame with DBI?

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
mchen
  • 9,808
  • 17
  • 72
  • 125

3 Answers3

19

Really late, my answer, but maybe still helpful...

There is no single function (I know) in the DBI/odbc package but you can replicate the update behavior using a prepared update statement (which should work faster than RODBC's sqlUpdate since it sends the parameter values as a batch to the SQL server:

library(DBI)
library(odbc)

con <- dbConnect(odbc::odbc(), driver="{SQL Server Native Client 11.0}", server="dbserver.domain.com\\default,1234", Trusted_Connection = "yes", database = "test")  # assumes Microsoft SQL Server

dbWriteTable(con, "iris", iris, row.names = TRUE)      # create and populate a table (adding the row names as a separate columns used as row ID)

update <- dbSendQuery(con, 'update iris set "Sepal.Length"=?, "Sepal.Width"=?, "Petal.Length"=?, "Petal.Width"=?, "Species"=? WHERE row_names=?')

# create a modified version of `iris`
iris2 <- iris
iris2$Sepal.Length <- 5
iris2$Petal.Width[2] <- 1
iris2$row_names <- rownames(iris)  # use the row names as unique row ID

dbBind(update, iris2)  # send the updated data

dbClearResult(update)  # release the prepared statement

# now read the modified data - you will see the updates did work
data1 <- dbReadTable(con, "iris")

dbDisconnect(con)

This works only if you have a primary key which I created in the above example by using the row names which are a unique number increased by one for each row...

For more information about the odbc package I have used in the DBI dbConnect statement see: https://github.com/rstats-db/odbc

R Yoda
  • 8,358
  • 2
  • 50
  • 87
  • Thank you. is there a limit to the number of rows we are trying to update at once? I have created a long sql statement that performs a MERGE and if the statement is too long `dbGetQuery` returns 0 (0 rows affected) – RockScience Mar 06 '18 at 07:35
  • There is no simple answer because the DBI package just defines an interfaces and leaves the implementation to different DBI-compliant packages. In the end all limitations depend on the DBI-compliant driver package (e. g. `odbc`) and the database plus the binary DB driver used by the DBI driver package. Are you using the `odbc` package? Then the best way is to create a reproducible code example and database and open an issue (asking to fix or at least document the limitations) at https://github.com/r-dbi/odbc/issues – R Yoda Mar 07 '18 at 08:11
  • @RockScience Out of similar problems you describe I have started a DBI compliance check "project" where I am running the `DBItest` unit tests against different DBI configurations to estimate the maturity, open issues and limitations of each DBI configuration. The status is still pre-alpha and I have not published or drawn any conclusion from the results without taking to the developers first but you can watch the progress on https://github.com/aryoda/R_DBI_compliance_reports (and of course try to run the tests against your own configuration) – R Yoda Mar 07 '18 at 09:44
  • Thank you, very impressive work!. I will definitely look at it. From what I see so far using odbc and SQL Server, a large (>30k rows) MERGE specifying the source values fails and I had to send to a temporary table then dbExecute the merge between tables. I will check how to translates in your tests. – RockScience Mar 07 '18 at 10:10
  • If you post a new question here with an example to reproduce the error I will look into it (please add the link here then so that I can find your question) – R Yoda Mar 07 '18 at 13:25
  • I get the error Error: Failed to prepare query: ERROR: syntax error at or near "=?" LINE 1: update iris set "Sepal.Length"=?, "Sepal.Width"=?, "Petal.Le... What could be causing this? – Ruan Mar 10 '21 at 12:09
  • @Ruan Please open a new question (incl. a minimal reproducible example) at SO to separate the issues. You can add a link to this answer of course. THX :-) – R Yoda Mar 14 '21 at 09:16
5

Building on R Yoda's answer, I made myself the helper function below. This allows using a dataframe to specify update conditions.

While I built this to run transaction updates (i.e. single rows), it can in theory update multiple rows passing a condition. However, that's not the same as updating multiple rows using an input dataframe. Maybe somebody else can build on this...


dbUpdateCustom = function(x, key_cols, con, schema_name, table_name) {
  
  if (nrow(x) != 1) stop("Input dataframe must be exactly 1 row")
  if (!all(key_cols %in% colnames(x))) stop("All columns specified in 'key_cols' must be present in 'x'")
  
  # Build the update string --------------------------------------------------

  df_key     <- dplyr::select(x,  one_of(key_cols))
  df_upt     <- dplyr::select(x, -one_of(key_cols))
  
  set_str    <- purrr::map_chr(colnames(df_upt), ~glue::glue_sql('{`.x`} = {x[[.x]]}', .con = con))
  set_str    <- paste(set_str, collapse = ", ")
  
  where_str  <- purrr::map_chr(colnames(df_key), ~glue::glue_sql("{`.x`} = {x[[.x]]}", .con = con))
  where_str  <- paste(where_str, collapse = " AND ")
  
  update_str <- glue::glue('UPDATE {schema_name}.{table_name} SET {set_str} WHERE {where_str}')
  
  # Execute ------------------------------------------------------------------
  
  query_res <- DBI::dbSendQuery(con, update_str)
  DBI::dbClearResult(query_res)

  return (invisible(TRUE))
}

Where

  • x: 1-row dataframe that contains 1+ key columns, and 1+ update columns.
  • key_cols: character vector, of 1 or more column names that are the keys (i.e. used in the WHERE clause)
mkirzon
  • 413
  • 5
  • 9
  • This looks very interesting for my case. Im seeing some errors when my dataframe has multiple rows which i want to update. `Error in dbUpdateCustom: Input dataframe must be exactly 1 row` – Andres Mora Nov 30 '21 at 23:43
0

Here is a little helper function I put together using REPLACE INTO to update a table using DBI, replacing old duplicate entries with the new values. It's basic and for my own needs, but should be easy to modify. All you need to pass to the function is the connection, table name, and dataframe. Note that the table must have a PRIMARY KEY column. I've also included a simple working example.

row_to_list <- function(Y)  suppressWarnings(split(Y, f = row(Y)))

sql_val <- function(y){
  if(!is.numeric(y)){
    return(paste0("'",y,"'"))
  }else{
    if(is.na(y)){
      return("NULL")
    }else{
      return(as.character(y))
    }
  }
}

to_sql_row <- function(x) paste0("(",paste(do.call("c", lapply(x, FUN = sql_val)), collapse = ", "),")")

bracket <- function(x) paste0("`",x,"`")

to_sql_string <- function(x) paste0("(",paste(sapply(x, FUN = bracket), collapse = ", "),")")

replace_into_table <- function(con, table_name, new_data){
  #new_data <- data.table(new_data)
  cols <- to_sql_string(names(new_data))
  vals <- paste(lapply(row_to_list(new_data), FUN = to_sql_row), collapse = ", ")
  query <- paste("REPLACE INTO", table_name, cols, "VALUES", vals)
  rs <- dbExecute(con, query)
  return(rs)
}

tb <- data.frame("id" = letters[1:20], "A" = 1:20, "B" = seq(.1,2,.1)) # sample data
dbWriteTable(con, "test_table", tb) # create table
dbExecute(con, "ALTER TABLE test_table ADD PRIMARY KEY (id)") # set primary key

new_data <- data.frame("id" = letters[19:23], "A" = 1:5, "B" = seq(101,105)) # new data
new_data[4,2] <- NA # add some NA values
new_data[5,3] <- NA
table_name <- "test_table"
replace_into_table(con, "test_table", new_data)

result <- dbReadTable(con, "test_table")
Seth
  • 19
  • 4