5

I'm trying to use summarize to get the first result for each group, but it returns the column header instead:

(get_table is a custom function that gets a data table from a Postgres db)

require(dplyr)
require(RPostgres)

tbl <- get_table(my_server, my_table) %>%
  select(column_a, column_b) %>%
  group_by(column_a) %>%
  summarize(first_b = first(column_b))

The result looks like

a first_b
1 "column_b"
2 "column_b"
3 "column_b"

If I use dplyr::collect() before summarize() I get the desired result but this really slows down performance. Any ideas how I can summarize without using collect first?

Christian Welsch
  • 424
  • 2
  • 10
  • FYI I've reproduced this on another PostgresDB. What a strange behavior. (And it has a similar behavior if you do `first(anything)`, even if `anything` isn't a column name) – David Robinson Jul 17 '18 at 14:40
  • Very interesting problem. Seems like a bug from dbplyr; you might want to refactor this like so: https://stackoverflow.com/a/7630564/4738478 – zlipp Jul 17 '18 at 14:43
  • 2
    I've reported the bug [here](https://github.com/tidyverse/dbplyr/issues/129). Note that it can be basically reproduced (if you have the Lahman package installed) with `tbl(dbplyr::lahman_sqlite(), "Batting") %>% summarize(first(G))` – David Robinson Jul 17 '18 at 14:59

0 Answers0