2

This seems like it should be simple but as a complete beginner to SQL, I've spent a large amount of time trying to figure this out without any luck

Say I have a dataframe in R like this:

df1 <- data.frame(value = c(1, 2, 3, 4),
                  ID = c("foo", "bar", "baz", "waffle"))
# value     ID
# 1     1    foo
# 2     2    bar
# 3     3    baz
# 4     4    waffle

I can load it into a SQLite database easily:

table_name <- "mytable"

library("RSQLite")
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, table_name, df1, row.names = FALSE)
dbListTables(con)
# [1] "mytable"

Now say I create another dataframe, that has a mix of

  • entries which are exactly the same as the first df

  • entries which are partially present in the first df but have updated values

  • entries which are not present in the first df

-

df2 <- data.frame(value = c(1, 3, 5, 6), 
                  ID = c("foo", "bar", "abc", "zzz"))
# value  ID
# 1     1 foo # present in df1
# 2     3 bar # updated from df1
# 3     5 abc # not in df1
# 4     6 zzz # not in df1

Now, I want to update my SQLite table with this new dataframe;

  • currently existing entries which are exactly the same should be skipped

  • currently existing entries which are different should be updated

  • new entries should be appended

-

My best guess is that the code required would be structured like this:

if(! dbExistsTable(con, table_name)){
    # write the df2 to table, if the table doesn't already exist
    dbWriteTable(con, table_name, df2, row.names = FALSE)
} else {
    # update the entries in the table with the entries in df2
    for(i in seq(nrow(df2))){

        irow <- df2[i, ]

        # check if irow is already in the table
        # if its already in the table, update the table's irow entry if its different
        # otherwise append irow to the table
        # or break() if irow is already present and identical

    }
}

But what code would be used to actually check if the row (irow) is already present in the SQLite, and then update it? It seems like it might be some usage of dbBind() but I haven't been able to find a working example of how to use it for this purpose, and the docs are not clear. Is this kind of method going to be efficient for millions of entries and an arbitrary number of columns?

user5359531
  • 3,217
  • 6
  • 30
  • 55
  • Look into using dbplyr and dplyr. The joins for dplyr are explained well enough that you should be able to come up with a solution. – Richard Lusch Sep 27 '17 at 02:01
  • Thanks but I try to avoid using those packages, already have too many dependencies and I'm proficient enough in base R. Unless there's something in `dbplyr` that doesn't exist elsewhere? – user5359531 Sep 27 '17 at 02:04
  • This discussion - https://stackoverflow.com/questions/19337029/insert-if-not-exists-statement-in-sqlite - seems to use the logic you are looking for. (Updated comment to find an SQLite version) – thelatemail Sep 27 '17 at 02:37
  • reminder for myself; I figured this one out, the solution was a little involved so I haven't had time to write up a MWE to post here yet – user5359531 Oct 17 '17 at 16:30

0 Answers0