How to insert list of data.frame(s) to database?

Question

I want to make an SQLite database from a list of data.frame(s) using the dplyr package. It looks like dplyr::copy_to is what I need. I think the problem is related to NSE. Also see the vignette on dplyr with databases.

data(iris)
data(cars)

res <- list("iris" = iris, "cars" = cars)

my_db <- dplyr::src_sqlite(paste0(tempdir(), "/foobar.sqlite3"),
                       create = TRUE)

lapply(res, function(x) dplyr::copy_to(my_db, x))

Error: Table x already exists.

Try deleting and re-creating your `sqlite` database. It looks like it already worked successfully. — Michael Griffiths, Oct 10 '16 at 17:55
Still fails. There is no Table x. Only iris and cars. That is why I think it is a NSE issue. — jsta, Oct 10 '16 at 17:58

score 1 · Accepted Answer · answered Oct 10 '16 at 19:38

The reason for this is because the default table name is based off of the name of the data frame in R. When using lapply, it does not take the index name.

The documentation for dplyr::copy_to.src_sql contains:

## S3 method for class 'src_sql'
copy_to(dest, df, name = deparse(substitute(df)),
  types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL,
  analyze = TRUE, ...)

The line name = deparse(substitute(df)) shows where the table name comes from.

We can see what that turns into:

res <- list("iris" = iris, "cars" = cars)
tmp = lapply(res, function(x) print(deparse(substitute(x))))
#> [1] "X[[i]]"
#> [1] "X[[i]]"

The name for the table in the SQLite source is X[[i]]; once the first line has been executed, the table already exists.

We can resolve by either using an explicit for loop and passing the name secondarily, or by using lapply on an index number.

For instance:

res <- list("iris" = iris, "cars" = cars)
my_db <- dplyr::src_sqlite(paste0(tempdir(), "/foobar.sqlite3"),
                           create = TRUE)
lapply(seq_along(res), function(i, l){dplyr::copy_to(my_db, l[[i]], names(l)[[i]])}, l = res)
my_db %>% tbl("iris") %>% head

#> Source:   query [?? x 5]
#> Database: sqlite 3.8.6 
#> 
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl>   <chr>
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

my_db %>% tbl("cars") %>% head

#> Source:   query [?? x 2]
#> Database: sqlite 3.8.6 
#> 
#>   speed  dist
#>   <dbl> <dbl>
#> 1     4     2
#> 2     4    10
#> 3     7     4
#> 4     7    22
#> 5     8    16
#> 6     9    10

I found this formatting easier to read: `lapply(seq_along(res), function(i, dt = res) dplyr::copy_to(my_db, dt[[i]], names(dt)[[i]]))` — jsta, Oct 10 '16 at 21:39

Julian · Answer 2 · 2022-11-17T09:40:03.923

0

If someone prefers a {purrr} option:

imap(res, ~dplyr::copy_to(dest = my_db, 
                            df = .x,
                            name = .y,
                            overwrite = TRUE))

or

imap(res, ~dbWriteTable(conn = my_db, .y,
                                        value = .x,
                                       overwrite = TRUE))

edited Nov 17 '22 at 09:40

answered Nov 17 '22 at 09:15

Julian

6,586
2
9
33

How to insert list of data.frame(s) to database?

2 Answers2

Linked