3

I'm getting the following error message when I try some code on a distant Postgres database.

The following peusdo-reproducing code works well when dataframes are local but not when they are distant.

library(tidyverse)
library(dbplyr)
library(RPostgres)


event <- tibble(id = c("00_1", "00_2", "00_3", "00_4", "00_5", "00_6", "00_7"),
               type_id = c("A", "B", "C", "B", "A", "B", "C"))


detail <- tibble(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L),
                event_id = c("00_1", "00_1", "00_2", "00_2", "00_3", "00_4", "00_4", "00_5", "00_6", "00_6", "00_7", "00_8"),
                type_id = c(3L, 4L, 6L, 7L, 2L, 6L, 3L, 2L, 6L, 5L, 2L, 1L))


event_f <- event %>%
 mutate(new_variable = id %in% (detail %>%
          filter(type_id == 6) %>%
          pull(event_id))) %>%
 collect()

Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "c('tbl_PqConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

MattnDo
  • 444
  • 1
  • 5
  • 17
  • Are you really using a package called `Rpostgres`? The one on CRAN is called `RPostgres`. When I use that, I don't get any errors. – user2554330 Apr 23 '19 at 08:41
  • Sorry, it's RPostgres. Have you try to reproduce the above example with distant dataframes for `event` and `detail`? Because when it's local it works out well indeed. – MattnDo Apr 23 '19 at 08:50
  • No, I used your example as posted. – user2554330 Apr 23 '19 at 09:18

1 Answers1

6

The problem is most likely caused by the nested dplyr query. There are two possibilities here:

  1. dbplyr can not translate your query into SQL,

  2. dbplyr translates your query but it is not valid SQL.

How to tell

dbplyr attempts to translate each set of commands into SQL. One way to verify this is to use the function show_query().

For example the R command:

my_table %>% mutate(new_col = 2 * old_col + id) %>% select(new_col, id) %>% show_query()

Will return something like the following SQL command:

SELECT 2 * old_col + id AS new_col, id
FROM database.my_table

This will only happen if translation of R to SQL has been possible. Therefore:

  1. If show_query returns SQL, you should review the SQL to determine where it is wrong and adapt your R commands to correct this

  2. If show_query does not return, or gives an error then dbplyr has not be able to translate your query, and it needs to be restructured.

Anticipating that the problem is caused by the nested dplyr commands (detail %>% filter %>% pull), I recommend replacing this by a semi_join as follows:

detail_f <- detail %>%
    filter(type_id == 6)

event_f <- event %>%
    semi_join(detail_f, by = c("id" = "type_id")) %>%
    collect()

If you are unfamiliar with semi-joins you might find this post helpful. R also supports the use of anti-joins.

Edit: misread your initial query.

As you want to add an indicator for the presence/absence of event_id in your output table, you can probably avoid semi- or anti-joins. Perhaps something like the following:

detail_f <- detail %>%
    filter(type_id == 6) %>%
    select(id_to_compare = event_id) %>%
    mutate(new_variable = 1)

event_f <- event %>%
    left_join(detail_f, by = c("id" = "id_to_compare")) %>%
    mutate(new_variable = ifelse(is.na(new_variable), 0, new_variable) %>%
    collect()

Note, I have used 0 & 1 here instead of FALSE & TRUE as some versions of SQL don't handle these as easily as R does.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • Thanks for your detailed response. It's indeed the nested dplyr commands that create this problem as the `show_query()` returns the same error. If I want to replicate the expected output I'll have to do a `mutate(new_variable == TRUE)` and then `anti_join()` with `mutate(new_variable == FALSE)`. Am I correct in thinking that? – MattnDo Apr 24 '19 at 08:20
  • Not quite. I've edited my answer to better replicate your expected output. – Simon.S.A. Apr 25 '19 at 04:30