46

I'm using R to plot some data I pull out of a database (the Stack Exchange data dump, to be specific):

dplyr::tbl(serverfault, 
           dbplyr::sql("
select year(p.CreationDate) year,
       avg(p.AnswerCount*1.0) answers_per_question,
       sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
from Posts p
where PostTypeId = 1
group by year(p.CreationDate)
order by year(p.CreationDate)
"))

The query works fine on SEDE, but I get this error in the R console:

Error: <SQL> 'SELECT *
FROM (
select year(p.CreationDate) year,
       avg(p.AnswerCount*1.0) answers_per_question,
       sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
from Posts p
where PostTypeId = 1
group by year(p.CreationDate)
order by year(p.CreationDate)
) "zzz11"
WHERE (0 = 1)'
  nanodbc/nanodbc.cpp:1587: 42000: [FreeTDS][SQL Server]Statement(s) could not be prepared. 

I reckoned "Statement(s) could not be prepared." meant that SQL Server didn't like the query for some reason. Unfortunately, it didn't give any hint about what went wrong. After fiddling with the query for a bit, I noticed it was wrapped in a subselect, according to the error message. Copying and executing the full query as constructed by one of the libraries in the chain, SQL Server gave me this more informative error message:

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

Now the solution is obvious: remove (or comment out) the order by clause. But where is the detailed error message in the R console? I'm using Rstudio, should that matter. If I could get the full exception right next to the code I'm working on, it would help me fix bug a lot quicker. (And just to be clear, I get cryptic errors from dplyr::tbl often and typically use binary search debugging to fix them.)

Community
  • 1
  • 1
Jon 'links in bio' Ericson
  • 20,880
  • 12
  • 98
  • 148
  • What if you query with `DBI::dbGetQuery()`? Does that improve anything over `dbplyr::sql`? – Mako212 Aug 10 '18 at 19:29
  • 2
    Since the error message you do see is being thrown by nanodbc directly, I wonder how much dbplyr can really do about this. My hunch would be that this isn't something easily fixed by a tweak at the R level. If it were me I'd go straight to the horse's mouth and file a github issue which will get to the expert (Jim Hester) more directly. – joran Aug 10 '18 at 19:32
  • Could you clarify how you are getting the detailed error message? – Jim Aug 10 '18 at 20:26
  • @Jim: I copied the full query from the error message into a [SEDE query](http://data.stackexchange.com/stackoverflow/query/new). – Jon 'links in bio' Ericson Aug 10 '18 at 21:24
  • 2
    Oh, I have no idea what driver / connection stack SEDE is using for its error messages, but the error in R is coming from the freeTDS driver and is the only information the odbc R package has. There isn't a way to make this error more informative except to use a different driver (maybe try the ones from Microsoft or the RStudio professional drivers)? – Jim Aug 10 '18 at 21:54
  • @MuratYıldız: Since my question was about dplyr and not SQL Server in particular, that doesn't actually help, I'm afraid. – Jon 'links in bio' Ericson May 09 '20 at 20:00
  • More information about which database you're using and which driver you're using to be able to answer this question (ie the code prior to the above code being run) – Roger-123 Dec 14 '21 at 00:36
  • 1
    I think this has since been resolved. When I run the literal code above (SQL Server 2016, DBI-1.1.2, odbc-1.3.3, R-4.1.2), I get the full detailed error message, *`The ORDER BY clause ... is also specified.*` Unfortunately, I'm using mssqlodbc itself, not freetds (never found it to be stable), so I cannot rule out if it's driver-specific or generic in `dbplyr`. @JonEricson, can you confirm if the bug still presents itself with FreeTDS? If not, I suggest this question can either be closed or self-answers as resolved-by-updates. – r2evans Apr 27 '22 at 16:37
  • 2
    @r2evans: It's been a while. ;-) If I have a chance to reproduce the issue on my new machine, I'd be happy to self-answer. – Jon 'links in bio' Ericson Apr 27 '22 at 17:00
  • Even if its is indeed quite a while, I think function `capture.output`- should be handy in such situation. – Eric Lecoutre Feb 08 '23 at 13:23
  • Not all drivers are created equal, and the quotes might be the issue. Did you try ' instead of " in your query? I have had that cause problems in the past. R treats them the same, not all drivers convert them to the SQL dialect standard you are using/ – sconfluentus Apr 18 '23 at 15:32

1 Answers1

-4

To get more detailed error messages from dplyr::tbl, you can modify your code to include the tryCatch() function. This function allows you to catch and handle errors, including extracting detailed error messages. Try to modify your code as below.

result <- tryCatch({
  dplyr::tbl(serverfault, 
             dbplyr::sql("
               select year(p.CreationDate) year,
                      avg(p.AnswerCount*1.0) answers_per_question,
                      sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
               from Posts p
               where PostTypeId = 1
               group by year(p.CreationDate)
               order by year(p.CreationDate)
             "))
}, error = function(e) {
  message("Detailed error message: ", conditionMessage(e))
  stop(e)
})

# Continue with your code using the `result` object if no error occurred

In the modified code, the tryCatch() function wraps around the dplyr::tbl() call. Inside the tryCatch() function, you define an error handler using the error argument. In the error handler, you can access the detailed error message using the conditionMessage() function.

If an error occurs, the error handler will print the detailed error message to the console using message(), and then re-throw the error using stop(e) to halt the execution of further code. If no error occurs, the code will continue using the result object.

This approach allows you to catch and display detailed error messages when working with dplyr::tbl or other functions that might throw errors.