1

I'm trying to use dplyr/dbplyr (version 0.7) with a database (Microsoft SQL Server 2014). I've been able to connect to this and extract data using the RODBC, DBI and odbc packages. The problem arises when I try to use dplyr verbs directly with the base.

When I attempt to use dplyr, I get the following error:

Error in new_result(connection@ptr, statement) : std::bad_alloc

I raised this on the dplyr repo, where Hadley told me it was most likely an odbc error. Jim was super helpful when I raised the issue on the odbc repo, but wasn't able to solve the problem.

To be clear:

RODBC::sqlQuery() and DBI::dbGetQuery() both work: I get a dataframe back as I would expect, and I can use SQL queries to get back whatever I want. I only run into the memory allocation error when attempting to use dplyr verbs (actually, even when I try tbl()). There is a lot of data, but I'm querying for a subset of it, and it fits comfortably in R memory.

Ideally, I would rather keep the data out of R's memory altogether, if possible. The reason for this is that I'm building a Shiny app that will produce plots etc based on aspects of the data that the user chooses. As you can imagine, connecting to the database, sending queries and receiving data everytime the user does this renders the app somewhat useless because it's so slow. My ideal situation is like the RStudio example:

my_db <- src_mysql(
  dbname = "shinydemo",
  host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
  user = "guest",
  password = "guest"
)

> my_db %>% tbl("City") %>% head(5)
# Source:   lazy query [?? x 5]
# Database: mysql 10.0.17-MariaDB [guest@shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com:/shinydemo]
     ID           Name CountryCode      District Population
  <dbl>          <chr>       <chr>         <chr>      <dbl>
1     1          Kabul         AFG         Kabol    1780000
2     2       Qandahar         AFG      Qandahar     237500
3     3          Herat         AFG         Herat     186800
4     4 Mazar-e-Sharif         AFG         Balkh     127800
5     5      Amsterdam         NLD Noord-Holland     731200

However, with my base I am unable to do that. Any help would be much appreciated.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
RobertMyles
  • 2,673
  • 3
  • 30
  • 45

2 Answers2

1

Sigh. (Answer for anybody who runs into something similar.)

Turns out this is a problem with the specific table I'm trying to access.

This table comes back fine without problems with RODBC and DBI/odbc, but as soon as I use dplyr it spits out this error.

Other tables in the database work fine with dplyr, so it must be something to do with this specific table. Worse, I have no idea why.

sm925
  • 2,648
  • 1
  • 16
  • 28
RobertMyles
  • 2,673
  • 3
  • 30
  • 45
  • Dear Robert, can you share the 'dplyr' query you are using? Separate from the query my sense would be to check the column data you are retrieving. It might be an unusual value in the data. Why do I say that? I have seen something similar in the past. Turned out one value in the column was being retrieved and converted which caused the query to bomb. Just an idea. Good luck with debugging. – Technophobe01 Jun 27 '17 at 14:54
  • @Technophobe01 Thanks! Have a look at this thread on the odbc repo: [https://github.com/rstats-db/odbc/issues/71](https://github.com/rstats-db/odbc/issues/71) -- we went into detail trying to figure out what the problem is. I'm pretty sure it has to do with just one column, but any SQL query I use with dplyr's `tbl()` doesn't seem to ignore this column before it processes things, so I get the error even trying `SELECT` with some columns (and not this one). Any SQL suggestions are appreciated! – RobertMyles Jun 27 '17 at 15:00
  • 2
    Robert, I read the thread (odbc repo: https://github.com/rstats-db/odbc/issues/71 ) my hunch is that you have a special character encoding error in the referenced column. Is the column type ASCII or Unicode? Why, do I say this? The column data is not in English, hence it looks (to me at least) like there are special characters embedded in the column data. My approach would be to reformat the column data entries or cast the data before manipulating it. Hope this helps. – Technophobe01 Jun 27 '17 at 15:42
  • Thanks Technophobe, I think it is something along those lines, I'll see how I can get it changed. – RobertMyles Jun 27 '17 at 16:37
1

I had a very similar issue with Sybase. My problem turned out to be totally unrelated to the odbc or DBI packages.

The solution for me was to make sure I was using an ODBC driver that supports an 8-bytes SQLLEN datatype. On Linux running the command odbcinst -j showed me that the SQLLEN Size was 8 for my installed unixODBC version 2.3.1 running on RHEL Server 7.5 64-bit.

By default the Sybase driver was symlinked to the 4-bytes version. Following the instructions below I made sure I was linking to the 8-bytes version and my problems were solved. Oddly I never experienced these issues when issuing commands via isql but... Perhaps this will work for @robertmc with SQL Server.

http://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc20155.1600/doc/html/san1361915533650.html

Posted this also in one of the related GitHub issues: https://github.com/r-dbi/odbc/issues/174#issuecomment-403106647

wphampton
  • 504
  • 5
  • 13