1

I opened an issue for this. But in case this is operator error and not a problem with the package, I'm posting here for coding advice.

The most recent (v2.1) update to dbplyr was supposed to increase support for across and associated functions. However, I'm still getting errors when I try to use it.

Should I be using different syntax? Is there a workaround?

library(dplyr, warn.conflicts = FALSE)
library(bigrquery)

set.seed(02042021)

Sys.setenv(BIGQUERY_TEST_PROJECT = "elite-magpie-257717")
bq_deauth()
bq_auth(email="ariel.balter@gmail.com")

conn = DBI::dbConnect(
  bigrquery::bigquery(),
  project = bq_test_project(),
  dataset = "test_dataset"
)

my_table = data.frame(
  A = replicate(10, paste(sample(letters[1:3], 3, replace=T), collapse="")),
  B = replicate(10, paste(sample(letters[1:3], 3, replace=T), collapse="")),
  C = replicate(10, paste(sample(letters[1:3], 3, replace=T), collapse="")),
  D = runif(10)
)
my_table
#>      A   B   C         D
#> 1  bcb cbb bbb 0.3620390
#> 2  aac aac bba 0.5505868
#> 3  aca abb bcb 0.4028455
#> 4  bca baa bbb 0.3247477
#> 5  bcc cac ccc 0.6861223
#> 6  cac bba baa 0.6970764
#> 7  bcb bbc acc 0.6873332
#> 8  bca acb acb 0.5391651
#> 9  cba ccc abc 0.9442450
#> 10 cca cbc bcc 0.6319561

my_table %>%
  mutate(
    has_ab = if_any(everything(), ~grepl("ab", .))
  )
#>      A   B   C         D has_ab
#> 1  bcb cbb bbb 0.3620390  FALSE
#> 2  aac aac bba 0.5505868  FALSE
#> 3  aca abb bcb 0.4028455   TRUE
#> 4  bca baa bbb 0.3247477  FALSE
#> 5  bcc cac ccc 0.6861223  FALSE
#> 6  cac bba baa 0.6970764  FALSE
#> 7  bcb bbc acc 0.6873332  FALSE
#> 8  bca acb acb 0.5391651  FALSE
#> 9  cba ccc abc 0.9442450   TRUE
#> 10 cca cbc bcc 0.6319561  FALSE

my_table %>%
  mutate(
    has_ab = if_any(where(is.numeric), ~grepl("ab", .))
  )
#>      A   B   C         D has_ab
#> 1  bcb cbb bbb 0.3620390  FALSE
#> 2  aac aac bba 0.5505868  FALSE
#> 3  aca abb bcb 0.4028455  FALSE
#> 4  bca baa bbb 0.3247477  FALSE
#> 5  bcc cac ccc 0.6861223  FALSE
#> 6  cac bba baa 0.6970764  FALSE
#> 7  bcb bbc acc 0.6873332  FALSE
#> 8  bca acb acb 0.5391651  FALSE
#> 9  cba ccc abc 0.9442450  FALSE
#> 10 cca cbc bcc 0.6319561  FALSE

dbRemoveTable(
  conn=conn,
  name="test_dataset.mytable",
  value=my_table,
  overwrite=T
)

dbWriteTable(
  conn=conn,
  name="test_dataset.mytable",
  value=my_table,
  overwrite=T
)

my_table_bq = tbl(conn, "mytable")


my_table_bq %>%
  mutate(
    has_ab = if_any(everything(), ~grepl("ab", .))
  )
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "formula"

my_table_bq %>%
  mutate(
    has_ab = if_any(where(is.numeric), ~grepl("ab", .))
  )
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "function"

Created on 2021-02-05 by the reprex package (v1.0.0)

abalter
  • 9,663
  • 17
  • 90
  • 145
  • I tried without the formual just assign value `TRUE` / `FALSE` - it said that `Function not found: if_any at [1:28] [invalidQuery]` – Sinh Nguyen Feb 05 '21 at 22:11
  • Also, I think it more problem of Bigquery is not support these function - so it would be best to operate it in R and put back to BigQuery - What is the advantage of doing these operation in database instead? – Sinh Nguyen Feb 05 '21 at 22:18
  • @SinhNguyen -- It isn't a bigquery problem per-se as far as I see. Hadley is still working on improving support for `across` in dbplyr, and I thought is was further along. The reason to work in the DB is when the file is far too large to work on locally (10s of GB). That, in fact, is the beauty of `dbplyr`! – abalter Feb 05 '21 at 22:41
  • Incidentally, I just ran the same code using a local RSQLite database instead of BigQuery to make sure it was definitely NOT a BQ issue. – abalter Feb 05 '21 at 22:45
  • If you write a query in BQ - what would be equivalent to that operation in R? – Sinh Nguyen Feb 05 '21 at 23:23
  • If you mean "What SQL would be equivalent to what I'm doing in R", then I can't say. It seems like a difficult SQL operation. Test all columns for a matching string and report TRUE in a new column if the match is found in any column. BUT you don't know all the columns ahead of time. Just need to search all present. – abalter Feb 05 '21 at 23:27
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/228329/discussion-between-sinh-nguyen-and-abalter). – Sinh Nguyen Feb 05 '21 at 23:37

2 Answers2

2

I do not think this is possible with dbplyr (version 2.1.0) at present. Here are my test cases:

(1) working case

# shared setup
library(dplyr)
library(dbplyr)
data(iris)
df = tbl_lazy(iris, con = simulate_mssql()) %>%
  select(Sepal.Length)

df %>%
  mutate(new = Sepal.Length + 1) %>%
  show_query()

Returns the expected SQL:

<SQL>
SELECT `Sepal.Length`, `Sepal.Length` + 1.0 AS `new`
FROM `df`

(2) introducing a simple everything

df %>%
  mutate(new = if_any(everything(), TRUE)) %>%
  show_query()

Returns invalid sql because no translation for if_any or everything exists:

<SQL>
SELECT `Sepal.Length`, if_any(everything(), 1) AS `new`
FROM `df`

(3) simple is.numeric

df %>%
  mutate(new = if_any(where(is.numeric), TRUE)) %>%
  show_query()

Errors because is.numeric is passed as a function

Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "function"

(4) wrapping TRUE in a function that returns TRUE

df %>%
  mutate(new = if_any(everything(), ~{TRUE})) %>%
  show_query()

Errors because no translation for the implicit function ~{...} exists:

Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "formula"

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • `dbplyr` v2.1 has been [released](https://github.com/tidyverse/dbplyr/issues/588) (it is now available both through CRAN and conda). This intended to address [issues with operations across columns](https://github.com/tidyverse/dbplyr/pull/578). However I'm still having issues. Perhaps I'm out of scope of these fixes. Not sure, and would love workarounds. – abalter Feb 06 '21 at 01:02
  • Rerun for dbplyr 2.1 only change in output was `TRUE` becomes `1` in example **(2)**. – Simon.S.A. Feb 06 '21 at 03:48
1

Here is a potential work around based on this answer about dynamic case_when.

list_of_columns = colnames(df)

text_to_match = "ab"

cases = paste0("`", list_of_columns, "` %LIKE% '%", text_to_match, "%' ~ 1")
cases = c(cases, "1 == 1 ~ 0")

output = df %>%
    mutate(new_col = case_when(
      !!!rlang::parse_exprs(cases)
    ))

For another example of this technique, or as part of a reusable function, take a look at the collapse_indicator_columns function of my dbplyr_helpers repo.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41