0

I want to make an SQLite database from a list of data.frame(s) using the dplyr package. It looks like dplyr::copy_to is what I need. I think the problem is related to NSE. Also see the vignette on dplyr with databases.

data(iris)
data(cars)

res <- list("iris" = iris, "cars" = cars)

my_db <- dplyr::src_sqlite(paste0(tempdir(), "/foobar.sqlite3"),
                       create = TRUE)

lapply(res, function(x) dplyr::copy_to(my_db, x))

Error: Table x already exists.

user4157124
  • 2,809
  • 13
  • 27
  • 42
jsta
  • 3,216
  • 25
  • 35

2 Answers2

1

The reason for this is because the default table name is based off of the name of the data frame in R. When using lapply, it does not take the index name.

The documentation for dplyr::copy_to.src_sql contains:

## S3 method for class 'src_sql'
copy_to(dest, df, name = deparse(substitute(df)),
  types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL,
  analyze = TRUE, ...)

The line name = deparse(substitute(df)) shows where the table name comes from.

We can see what that turns into:

res <- list("iris" = iris, "cars" = cars)
tmp = lapply(res, function(x) print(deparse(substitute(x))))
#> [1] "X[[i]]"
#> [1] "X[[i]]"

The name for the table in the SQLite source is X[[i]]; once the first line has been executed, the table already exists.

We can resolve by either using an explicit for loop and passing the name secondarily, or by using lapply on an index number.

For instance:

res <- list("iris" = iris, "cars" = cars)
my_db <- dplyr::src_sqlite(paste0(tempdir(), "/foobar.sqlite3"),
                           create = TRUE)
lapply(seq_along(res), function(i, l){dplyr::copy_to(my_db, l[[i]], names(l)[[i]])}, l = res)
my_db %>% tbl("iris") %>% head

#> Source:   query [?? x 5]
#> Database: sqlite 3.8.6 
#> 
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl>   <chr>
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

my_db %>% tbl("cars") %>% head

#> Source:   query [?? x 2]
#> Database: sqlite 3.8.6 
#> 
#>   speed  dist
#>   <dbl> <dbl>
#> 1     4     2
#> 2     4    10
#> 3     7     4
#> 4     7    22
#> 5     8    16
#> 6     9    10
Michael Griffiths
  • 1,399
  • 7
  • 14
  • 1
    I found this formatting easier to read: `lapply(seq_along(res), function(i, dt = res) dplyr::copy_to(my_db, dt[[i]], names(dt)[[i]]))` – jsta Oct 10 '16 at 21:39
0

If someone prefers a {purrr} option:

imap(res, ~dplyr::copy_to(dest = my_db, 
                            df = .x,
                            name = .y,
                            overwrite = TRUE))

or

imap(res, ~dbWriteTable(conn = my_db, .y,
                                        value = .x,
                                       overwrite = TRUE))
Julian
  • 6,586
  • 2
  • 9
  • 33