7

I am trying to add a vector which I generated in R to a sqlite table as a new column. For this I wanted to use dplyr (I installed the most recent dev. version along with the dbplyr package according to this post here). What I tried:

library(dplyr)
library(DBI) 

#creating initial database and table
dbcon      <- dbConnect(RSQLite::SQLite(), "cars.db") 
dbWriteTable(dbcon, name = "cars", value = cars)
cars_tbl <- dplyr::tbl(dbcon, "cars")

#new values which I want to add as a new column 
new_values <- sample(c("A","B","C"), nrow(cars), replace = TRUE) 

#attempt to add new values as column to the table in the database
cars_tbl %>% mutate(new_col = new_values) #not working

What is an easy way to achieve this (not necessarily with dplyr)?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Alex
  • 4,925
  • 2
  • 32
  • 48
  • 1
    dplyr is designed to avoid modification to the input data. But you could create a new table based on an existing table/data frame and a `mutate()` transformation. – krlmlr May 15 '17 at 14:47

1 Answers1

9

Not aware of a way of doing this with dyplr, but you can do it with RSQLite directly. The problem is not actually with RSQLite, but the fact that I don't know how to pass a list to mutate. Note that, in your code, something like this would work:

cars_tbl %>% mutate(new_col = another_column / 3.14)

Anyway, my alternative. I've created a toy cars dataframe.

cars <- data.frame(year=c(1999, 2007, 2009, 2017), model=c("Ford", "Toyota", "Toyota", "BMW"))

I open connection and actually create the table,

dbcon <- dbConnect(RSQLite::SQLite(), "cars.db")
dbWriteTable(dbcon, name = "cars", value = cars)

Add the new column and check,

dbGetQuery(dbcon, "ALTER TABLE cars ADD COLUMN new_col TEXT")
dbGetQuery(dbcon, "SELECT * FROM cars")
  year  model new_col
1 1999   Ford    <NA>
2 2007 Toyota    <NA>
3 2009 Toyota    <NA>
4 2017    BMW    <NA>

And then you can update the new column, but the only tricky thing is that you have to provide a where statement, in this case I use the year.

new_values <- sample(c("A","B","C"), nrow(cars), replace = TRUE) 
new_values
[1] "C" "B" "B" "B"

dbGetPreparedQuery(dbcon, "UPDATE cars SET new_col = ? where year=?",
                   bind.data=data.frame(new_col=new_values,
                                        year=cars$year))

dbGetQuery(dbcon, "SELECT * FROM cars")
  year  model new_col
1 1999   Ford       C
2 2007 Toyota       B
3 2009 Toyota       B
4 2017    BMW       B

As a unique index, you could always use rownames(cars), but you would have to add it as a column in your dataframe and then in your table.

EDIT after suggestion by @krlmlr: indeed much better using dbExecute instead of deprecated dbGetPreparedQuery,

dbExecute(dbcon, "UPDATE cars SET new_col = :new_col where year = :year",
          params=data.frame(new_col=new_values,
                            year=cars$year))

EDIT after comments: I did not think about this a few days ago, but even if it is a SQLite you can use the rowid. I've tested this and it works.

dbExecute(dbcon, "UPDATE cars SET new_col = :new_col where rowid = :id",
          params=data.frame(new_col=new_values,
                            id=rownames(cars)))

Although you have to make sure that the rowid's in the table are the same as your rownames. Anyway you can always get your rowid's like this:

dbGetQuery(dbcon, "SELECT rowid, * FROM cars")
  rowid year  model new_col
1     1 1999   Ford       C
2     2 2007 Toyota       B
3     3 2009 Toyota       B
4     4 2017    BMW       B
lrnzcig
  • 3,868
  • 4
  • 36
  • 50
  • 1
    `dbGetPreparedQuery()` is deprecated, you should be able to use `dbExecute(..., params = data.frame())` instead. – krlmlr May 15 '17 at 14:46
  • Thanks for the answer. I got one problem with your solution. The only field in the database table with unique values is the id which corresponds almost to `rownames(df)`. The difference is a different data type. In the database the type is `ident` and in R it is `character`. I also tried it with `numeric` but both does not work. How can I fix that? – Alex May 16 '17 at 17:17
  • Hi @Alex I've modified my answer, if you still find issues just let me know. – lrnzcig May 16 '17 at 20:05
  • Thanks! I figured out that it already worked before. I tried it on the diamond data set (availble via `ggplot2`) with >50000 rows. It lasts a couple of minutes and hence I always thought R broke. Is it normal that it takes so much time? – Alex May 19 '17 at 08:45
  • Hi. It's always difficult to assess performance, it depends so much e.g. in your machine. Anyway if you are updating 50K registers, that's quite a lot. For example [this question](http://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite) has some tips on how to improve performance, it's not for R but some of the tips may apply. I'll try to take a look during the weekend and let you know. Thanks. – lrnzcig May 19 '17 at 19:49