I'm trying to use dplyr
/dbplyr
(version 0.7
) with a database (Microsoft SQL Server 2014
). I've been able to connect to this and extract data using the RODBC
, DBI
and odbc
packages. The problem arises when I try to use dplyr verbs directly with the base.
When I attempt to use dplyr
, I get the following error:
Error in new_result(connection@ptr, statement) : std::bad_alloc
I raised this on the dplyr repo, where Hadley told me it was most likely an odbc
error. Jim was super helpful when I raised the issue on the odbc
repo, but wasn't able to solve the problem.
To be clear:
RODBC::sqlQuery()
and DBI::dbGetQuery()
both work: I get a dataframe back as I would expect, and I can use SQL
queries to get back whatever I want. I only run into the memory allocation error when attempting to use dplyr
verbs (actually, even when I try tbl()
). There is a lot of data, but I'm querying for a subset of it, and it fits comfortably in R memory.
Ideally, I would rather keep the data out of R's memory altogether, if possible. The reason for this is that I'm building a Shiny
app that will produce plots etc based on aspects of the data that the user chooses. As you can imagine, connecting to the database, sending queries and receiving data everytime the user does this renders the app somewhat useless because it's so slow. My ideal situation is like the RStudio
example:
my_db <- src_mysql(
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
user = "guest",
password = "guest"
)
> my_db %>% tbl("City") %>% head(5)
# Source: lazy query [?? x 5]
# Database: mysql 10.0.17-MariaDB [guest@shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com:/shinydemo]
ID Name CountryCode District Population
<dbl> <chr> <chr> <chr> <dbl>
1 1 Kabul AFG Kabol 1780000
2 2 Qandahar AFG Qandahar 237500
3 3 Herat AFG Herat 186800
4 4 Mazar-e-Sharif AFG Balkh 127800
5 5 Amsterdam NLD Noord-Holland 731200
However, with my base I am unable to do that. Any help would be much appreciated.