5

I have very simple problem that produces error. Example will clear this one.

library(odbc)
library(DBI)
library(dplyr)
library(dbplyr)

con <- dbConnect(odbc(), "myDSN")

tbl_test <- tibble(ID = c("A", "A", "A", "B", "B", "B"),
                   val = c(1, 2, 3, 4, 5, 6),
                   cond = c("H", "H", "A", "A", "A", "H"))

dbWriteTable(con, "tbl_test", tbl_test, overwrite = TRUE)

After writing simple table to DB I add link to table in db and try to use simple conditional sums that work normally. But will face an error.

db_tbl <- tbl(con, in_schema("dbo", "tbl_test"))

db_tbl %>% 
  group_by(ID) %>% 
  summarise(sum = sum(val, na.rm = TRUE),
            count_cond = sum(cond == "H", na.rm=TRUE),
            sum_cond = sum(val == "H", na.rm=TRUE))

Error: <SQL> 'SELECT  TOP 10 "ID", SUM("val") AS "sum", SUM(CONVERT(BIT, IIF("cond" = 'H', 1.0, 0.0))) AS "count_cond", SUM(CONVERT(BIT, IIF("val" = 'H', 1.0, 0.0))) AS "sum_cond"
FROM dbo.tbl_test
GROUP BY "ID"'
  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Operand data type bit is invalid for sum operator.

I'm no expert, but feels like SQL can't understand TRUE as 1 and for that reason can't calculate sums. Is there away around this, as alot of times I face some kind of conditions. Below is just code for normal tibble to show that they should work.

tbl_test %>% 
  group_by(ID) %>% 
  summarise(sum = sum(val),
            count_cond = sum(cond == "H"),
            sum_cond = sum(val[cond == "H"]))

# A tibble: 2 x 4
  ID      sum count_cond sum_cond
  <chr> <dbl>      <int>    <dbl>
1 A        6.          2       3.
2 B       15.          1       6.

I Understand that this might not be reproducible example, as not everyone have DB connection available.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Hakki
  • 1,440
  • 12
  • 26
  • you could use mutate and ifelse before summarize to create a column of 1s and 0s to sum on afterwards – moodymudskipper Apr 01 '18 at 12:39
  • here are the translations you want to get to `https://stackoverflow.com/questions/582637/sql-server-equivalent-of-a-countif-aggregate-function`, so use `ifelse` or `casewhen`, it seem you were right and you can't sum booleans with `SQL Server` – moodymudskipper Apr 01 '18 at 12:44
  • I can't test now but what does this return ? `sum_cond = sum(ifelse(cond == "H",1,0),na.rm=TRUE)` – moodymudskipper Apr 01 '18 at 12:45
  • That will give me number of conditions each group, so it works. but conditional sums with other columns still won't. I think it could be solved with filter and then join, but feels like there might be easier way as well. Ooh, just use ifelse to refer to val column. Thanks. – Hakki Apr 01 '18 at 12:48
  • Do you mean that your example case would be solved by this but your real case is more complex that the one posted ? – moodymudskipper Apr 01 '18 at 12:52
  • 1
    db_tbl %>% group_by(ID) %>% summarise(sum = sum(val, na.rm=TRUE), count_cond = sum(ifelse(cond == "H",1,0),na.rm=TRUE), sum_cond = sum(ifelse(cond == "H",val,0),na.rm=TRUE)), I think this will do for now, using ifelse. Post this as answer and will approve it. – Hakki Apr 01 '18 at 12:52
  • Normally you can replace any `condition` by `ifelse(condition,1,0)` and i think it should work – moodymudskipper Apr 01 '18 at 12:53
  • what I posted few comment ago works and gives expected result. – Hakki Apr 01 '18 at 12:54

1 Answers1

7

SQL server can't sum booleans (it doesn't coerce TRUE to 1).

So you have to manually convert them, and one way is to use ifelse, your code becomes:

db_tbl %>%
  group_by(ID) %>% 
  summarise(sum = sum(val, na.rm=TRUE), 
            count_cond = sum(ifelse(cond == "H",1,0),na.rm=TRUE),
            sum_cond = sum(ifelse(cond == "H",val,0),na.rm=TRUE))
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167