5

I have a dataframe in R that has a list of records. I would like to delete all of the matching records from my ODBC datasource.

    data_to_delete ##This is my dataframe

    delete from RODBC datasource where record_id IN (select record_id 
                                                    from data_to_delete)

Any ideas on how to achieve this? I am currently using sqlQuery to pull data down and sqlSave to create new records, but I'm not sure how to delete records.

rascale
  • 79
  • 1
  • 6
  • relevant posts: https://stackoverflow.com/questions/55874019/delete-rows-from-sql-server-table-using-r-dbi-package/77019378#77019378, https://stackoverflow.com/questions/70395226/dbplyr-delete-row-from-a-table-in-database/77019429#77019429 – Aaron C Aug 31 '23 at 22:19

1 Answers1

4

As the answers to this related question explain, you will need to build up the string that holds theDELETE statement programmatically, and then you can execute a DELETE statement just as any other statement within sqlQuery.

So at the end of your DELETE, don't try to "select" from the R variable, rather just dump its contents into a comma-separated string. Below, I assume that whatever data_to_delete is, we can use paste with as.character to spit it into a comma-separated string. But you can modify that as needed.

I also assume that you mean for datasource to refer to the database, and I use some_table_name as a placeholder for the table's name within that database. You should adjust the values for your specific case.

del_query <- sprintf("DELETE from some_table_name
                      where record_id IN (%s)",
                     paste(as.character(data_to_delete), 
                           collapse=", ")
             )

sqlQuery(datasource, del_query)

Using this method, you need to be careful if the content of your data is large. There is often a symbol limit for the number of distinct symbols that can appear in the IN clause -- something like 2000 symbols as default in SQL Server.

If you need more values than that, you could first issue a query that writes the values into a temp table wholly on the database, and then issue a query with sqlQuery that contains the extra IN ( SELECT * FROM ... ) but you'll be querying from that temp table, not "from" the R variable.

Community
  • 1
  • 1
ely
  • 74,674
  • 34
  • 147
  • 228