5

I'm using dplyr's automatic SQL backend to query subtable from a database table. E.g.

my_tbl <- tbl(my_db, "my_table")

where my_table in the database looks like

batch_name    value
batch_A_1     1
batch_A_2     2
batch_A_2     3
batch_B_1     8
batch_B_2     9
...

I just want the data from batch_A_#, regardless of the number.

If I were writing this in SQL, I could use

select * where batch_name like 'batch_A_%'

If I were writing this in R, I could use a few ways to get this: grepl(), %in%, or str_detect()

# option 1
subtable <- my_tbl %>% select(batch_name, value) %>%
    filter(grepl('batch_A_', batch_name, fixed = T))
# option 2
subtable <- my_tbl %>% select(batch_name, value) %>%
    filter(str_detect(batch_name, 'batch_A_'))

All of these give the following Postgres error: HINT: No function matches the given name and argument types. You might need to add explicit type casts

So, how do I pass in SQL string functions or matching functions to help make the generated dplyr SQL query able to use a more flexible range of functions in filter?

(FYI the %in% function does work, but requires listing out all possible values. This would be okay combined with paste to make a list, but does not work in a more general regex case)

Zhe Zhang
  • 2,879
  • 2
  • 14
  • 12
  • 1
    Doesn't `tbl(my_db, sql(select * where batch_name %like% 'batch_A_%'))` work – shayaa Aug 15 '16 at 20:53
  • @shayaa Sorry for the delay, of course that's right! For more complicated queries, I'm not sure this would always work, but I can't think of a counterexample right now. Thanks! – Zhe Zhang Aug 29 '16 at 16:08

2 Answers2

13


A "dplyr-only" solution would be this

tbl(my_con, "my_table") %>% 
  filter(batch_name %like% "batch_A_%") %>% 
  collect()

Full reprex:

suppressPackageStartupMessages({
  library(dplyr)
  library(dbplyr)
  library(RPostgreSQL)
})

my_con <- 
  dbConnect(
    PostgreSQL(),
    user     = "my_user",
    password = "my_password",
    host     = "my_host",
    dbname   = "my_db"
  )

my_table <- tribble(
  ~batch_name,    ~value,
  "batch_A_1",     1,
  "batch_A_2",     2,
  "batch_A_2",     3,
  "batch_B_1",     8,
  "batch_B_2",     9
)

copy_to(my_con, my_table)

tbl(my_con, "my_table") %>% 
  filter(batch_name %like% "batch_A_%") %>% 
  collect()
#> # A tibble: 3 x 2
#>   batch_name value
#> *      <chr> <dbl>
#> 1  batch_A_1     1
#> 2  batch_A_2     2
#> 3  batch_A_2     3

dbDisconnect(my_con)
#> [1] TRUE

This works because any functions that dplyr doesn't know how to translate will be passed along as is, see ?dbplyr::translate\_sql.

Hat-tip to @PaulRougieux for his recent comment here

dpprdan
  • 1,727
  • 11
  • 24
0

Using dplyr

Get the table batch_name from the database as dataframe and use it for further data analysis.

library("dplyr")
my_db <- src_postgres(dbname = "database-name", 
                      host = "localhost", 
                      port = 5432, 
                      user = "username",
                      password = "password")

df <- tbl(my_db, "my_table")
df %>% filter(batch_name == "batch_A_1")

Using DBI and RPostgreSQL

Get the table by sending sql query

library("DBI")
library("RPostgreSQL")  
m <- dbDriver("PostgreSQL")    
con <- dbConnect(drv = m,
                 dbname = "database-name", 
                 host = "localhost", 
                 port = 5432,
                 user = "username",
                 password = "password")    
df <- dbGetQuery(con, "SELECT * FROM my_table WHERE batch_name %LIKE% 'batch_A_%'")

library("dplyr")
df %>% filter(batch_name == "batch_A_1")
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • Can you edit this to reflect the fact that I wanted to use SQL's regular expression function `like`? Your first code section is incorrect because it uses an explicit batch_name. The second code section is good because it uses an explicit where statement in the query (which I forgot about), however it still uses an explicit batch_name. – Zhe Zhang Aug 29 '16 at 16:12
  • it was missing FROM clause for table_name. You can replace it with the name of one of the tables in your database – Sathish Aug 29 '16 at 17:35
  • The first code gets the entire table 'my_table'. Then dplyr library command is used on the results from the database query – Sathish Aug 29 '16 at 17:38
  • The second code queries the table with column 'batch_name' matching the pattern of 'batch_A_' – Sathish Aug 29 '16 at 17:39