20

I'm programmatically fetching a bunch of datasets, many of them having silly names that begin with numbers and have special characters like minus signs in them. Because none of the datasets are particularly large, and I wanted the benefit R making its best guess about data types, I'm (ab)using dplyr to dump these tables into SQLite.

I am using square brackets to escape the horrible table names, but this doesn't seem to work. For example:

data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="[14m3-n4m3]")

This results in the error message:

Error in sqliteSendQuery(conn, statement, bind.data) : error in statement: no such table: 14m3-n4m3

This works if I choose a sensible name. However, due to a variety of reasons, I'd really like to keep the cumbersome names. I am also able to create such a badly-named table directly from sqlite:

sqlite> create table [14m3-n4m3](foo,bar,baz);
sqlite> .tables
14m3-n4m3

Without cracking into things too deeply, this looks like dplyr is handling the square brackets in some way that I cannot figure out. My suspicion is that this is a bug, but I wanted to check here first to make sure I wasn't missing something.

EDIT: I forgot to mention the case where I just pass the janky name directly to dplyr. This errors out as follows:

library(dplyr)

data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="14M3-N4M3")

Error in sqliteSendQuery(conn, statement, bind.data) : 
  error in statement: unrecognized token: "14M3"
Peter
  • 4,219
  • 4
  • 28
  • 40
  • 4
    This is just a guess, but it may be because of R's naming conventions. One possible workaround I've found is to use `name=gsub("[.]", "", make.names("[14m3-n4m3]"))`. If you did that you could still pass your original names in `copy_to()`, although they would be slightly different in the data. Not sure if that helps but it's my two cents. Very good question. – Rich Scriven Jan 25 '15 at 03:26
  • 1
    This is a better workaround than what I was starting down the road of implementing. Slowly talking myself into the idea that good names + other hassles > bad names. – Peter Jan 25 '15 at 03:50
  • I agree with @RichardScriven -- better to modify the names with existing tools than to try to roll your own! – Carl Witthoft Jan 25 '15 at 13:07
  • 1
    dplyr should do all the escaping for you. What happens if you do `copy_to(foo.db, df=iris, name="14m3-n4m3")`? – hadley Jan 27 '15 at 01:23
  • 1
    @hadley - Sadly, that doesn't work either. I've added an update to the question; short ver, I get: `unrecognized token: "14M3"` I've tried both the current CRAN release as well as `install_github("hadley/dplyr")` – Peter Jan 27 '15 at 03:49
  • 1
    Can you please file a bug? It suggests that I've either forgotten to quote something, or my quoting is incorrect – hadley Jan 27 '15 at 16:31
  • Will do. I didn't want to fire off a bug report until I ruled out my own mistake. Prior probability strongly suggests bug-like features in hadley packages are far more often user error on my part ;-) – Peter Jan 27 '15 at 19:10

1 Answers1

3

This is a bug in dplyr. It's still there in the current github master. As @hadley indicates, he has tried to escape things like table names in dplyr to prevent this issue. The current problem you're having arises from lack of escaping in two functions. Table creation works fine when providing the table name unescaped (and is done with dplyr::db_create_table). However, the insertion of data to the table is done using DBI::dbWriteTable which doesn't support odd table names. If the table name is provided to this function escaped, it fails to find it in the list of tables (the first error you report). If it is provided escaped, then the SQL to do the insertion is not synatactically valid.

The second issue comes when the table is updated. The code to get the field names, this time actually in dplyr, again fails to escape the table name because it uses paste0 rather than build_sql.

I've fixed both errors at a fork of dplyr. I've also put in a pull request to @hadley and made a note on the issue https://github.com/hadley/dplyr/issues/926. In the meantime, if you wanted to you could use devtools::install_github("NikNakk/dplyr", ref = "sqlite-escape") and then revert to the master version once it's been fixed.

Incidentally, the correct SQL-99 way to escape table names (and other identifiers) in SQL is with double quotes (see SQL standard to escape column names?). MS Access uses square brackets, while MySQL defaults to backticks. dplyr uses double quotes, per the standard.

Finally, the proposal from @RichardScriven wouldn't work universally. For example, select is a perfectly valid name in R, but is not a syntactically valid table name in SQL. The same would be true for other reserved words.

Community
  • 1
  • 1
Nick Kennedy
  • 12,510
  • 2
  • 30
  • 52
  • Issuing a PR for a long-standing bug to answer the question--bravo! – Peter Jun 15 '15 at 03:17
  • If you were very quick at downloading my fixed version, I've made another commit which adds in an extra line of code to clear up resources after the insert. Not sure how critical that is, but it seems sensible to do and was in the original `DBI::dbWriteTable`. – Nick Kennedy Jun 15 '15 at 10:31
  • 2
    After discussion with @hadley, my fix is now a much simpler one-liner that fixes the second issue described above (replacing `paste0` with `build_sql`). The necessary changes to `DBI::dbWriteTable` have been made in [`RSQLite` in the github version](https://github.com/rstats-db/RSQLite), though you'll also need the [github version of DBI](https://github.com/rstats-db/DBI). – Nick Kennedy Jul 22 '15 at 14:14