1

I have a table with the below structure.

ID Entity UserName UserRole UserStatus UpdatedDate

Clustered key is defined on columns Entity, UserName, UserRole.

UserStatus for an Entity-UserName-UserRole could be either Active or Inactive.

I could implement the INSERT and UPDATE statements for a dataframe df as follows:

Insert:

values <- paste("('",df$Entity,"','", df$UserName,"','",
                    ,df$UserRole,"','" ,df$UserStatus,"','" ,
                    df$UpdatedDate"')", sep="", collapse=",")
   
sqlStr_Insert<- sqlQuery(con, sprintf(paste("Insert into TBL
                                     (Entity, UserName,UserRole,UserStatus,UpdatedDate) values ",values)))

Update:

sql_string <- "UPDATE TBL SET UserStatus = '%s', UpdatedDate = '%s' WHERE Entity='%s' AND UserName ='%s' AND UserRole = '%s';"
  
  sqlStr_update <-sqlQuery(con, paste(sprintf(sql_string, UserStatus, UpdatedDate,df$Entity,df$UserName,df$UserRole), collapse=""))

The code works fine but I would like it to check if for every row in the df for a combination of Entity-UserName-UserRole already exists in the TBL. If found, run the UPDATE statement else run the INSERT statement.

For example - SQL Server table data:

ID Entity UserName UserRole UserStatus UpdatedDate
---------------------------------------------------
1  A      Ford    Analyst  Active     1/1/2020
2  B      Acura   Analyst  Active     1/5/2020
3  A      Tesla   Engineer Inactive   1/6/2020
4  A      Kia     Analyst  Active     1/1/2020

df to be inserted or updated:

 Entity UserName UserRole UserStatus UpdatedDate
---------------------------------------------------
 A      Accord   Analyst  Active     1/10/2020
 B      Acura    Analyst  Active     1/10/2020
 C      BMW      Engineer Active     1/10/2020

What should happen:

Only rows 1st and 3rd should get 'INSERTED' to the database TBL. UPDATE should only happen if the 'UserStatus' changes for an existing Entity-UserName-UserRole Combination.

I guess I could run a 'FOR' loop for every row of the dataframe df but I think that would be slow and not an optimum solution.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
CodeMaster
  • 431
  • 4
  • 14
  • 1
    You need `MERGE`. Or two separate statements: an `INSERT...WHERE NOT EXISTS` and a joined `UPDATE`. You may want to look at passing through a Table-Valued Parameter and do it all in one batch. And your current code is **wide open to SQL injection and syntax errors, use parameterized queries instead** – Charlieface Mar 17 '21 at 06:20
  • okay good point. I do perform numerous data checks before Insert/Update, so the df gets populated with 'valid' data. The above code passes the entire df in one go to the SQL server. How would I write the MERGE statement in this scenario? – CodeMaster Mar 17 '21 at 13:30
  • 1
    Does this answer your question? [SQL MERGE statement to update data](https://stackoverflow.com/questions/14806768/sql-merge-statement-to-update-data) – Charlieface Mar 17 '21 at 13:47
  • partly as it does not mention how to perform SQL merge statement in R using the data frame. I will try it anyways. – CodeMaster Mar 17 '21 at 14:40
  • CodeMaster, I suspect that the sql "merge" command is what you need. In many circles, this operation is referred to as an "UPSERT", where you update if it exists (based on one or more keys) and insert if not. I know of no R function that does it natively (though the [`dbx`](https://github.com/ankane/dbx) package supports other DBMSes, not sqlserver, perhaps they'll add it at some point). Waldi's two-step approach is what I've been using for a few years. – r2evans Mar 18 '21 at 18:02

1 Answers1

2

You could create a temporary table (use # prefix in SQL Server) and send a MERGE query as suggested by @CharlieFace:

data.new <- read.table(text='
Entity UserName UserRole UserStatus UpdatedDate
A      Accord   Analyst  Active     1/10/2020
B      Acura    Analyst  Active     1/10/2020
C      BMW      Engineer Active     1/10/2020', header = T)



library(DBI)
conn <- dbConnect(...)

# Create temporary table
dbWriteTable(conn,'#NEWDATA',data.new)

dbExecute(conn,"
MERGE INTO TBL WITH (HOLDLOCK) AS target
USING #NEWDATA AS source
    ON target.Entity = source.Entity
    AND target.UserName = source.UserName
    AND target.UserRole = source.UserRole
WHEN MATCHED THEN 
    UPDATE SET target.UserStatus = source.UserStatus
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Entity,UserName,UserRole,UserStatus,UpdatedDate)
    VALUES (Entity,UserName,UserRole,UserStatus,UpdatedDate);
          " )

# Remove temporary table
dbExecute(conn,"DROP TABLE #NEWDATA" )

# Check results
result <- dbGetQuery(conn,"SELECT * from TBL")
result

  ID Entity UserName UserRole UserStatus UpdatedDate
1  1      A     Ford  Analyst     Active    1/1/2020
2  2      B    Acura  Analyst     Active    1/5/2020
3  3      A    Tesla Engineer   Inactive    1/6/2020
4  4      A      Kia  Analyst     Active    1/1/2020
5 NA      A   Accord  Analyst     Active   1/10/2020
6 NA      C      BMW Engineer     Active   1/10/2020

Not sure how ID is managed on the server : autonumber?

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Thnks! yes. ID is autonumber. Will this process suffer if for some reason multiple people login to make changes at the same time; table creation and deletion... – CodeMaster Mar 18 '21 at 16:18
  • 1
    According to [the post](https://stackoverflow.com/a/14806962/13513328) shared by @CharlieFace, `HOLDLOCK` should protect against concurrency problems. – Waldi Mar 18 '21 at 16:22
  • I implemented the above method and it worked! I also did try with the 'for loop' that checks if a certain row already exists and it worked as well. Of course, the MERGE option is way faster, in my opinion and that will be my approach. thnx for the suggestions @waldi – CodeMaster Mar 19 '21 at 14:49