Renaming a Summarised Column inside Redshift dplyr operations

Question

I am using dplyr to do certain operations in Redshift so I don't load the data in memory.

data <- tbl(conn, "customers") %>%
  filter(age >= 18)
subset <- data %>% 
  filter(eye_color != "brown") %>%
  group_by(gender, method, age, region) %>% 
  summarise(sum(purchases)) %>%  # will create a column called sum(purchases)
  full_join(data, by=c("region", "age", "method"))

Right now when I look at the resulting dataframe, I will see a column called sum(purchases) and I want to rename it to purchases which will create the columns, purchase.x and purchase.y after the merge.

Most of the renaming that I've read so far are dealing with dataframes that are in memory rather than dataframes that are lazily evaluated with dbplyr. I have tried using rename, rename_, rename_at as well as different variations of select. I have also tried strategies laid out here and here but no luck

Is there a way to rename the sum(purchases). The only other option I have is to load the dataframe in memory at a certain step

data <- tbl(conn, "customers") %>%
  filter(age >= 18)
subset <- data %>% 
  filter(eye_color != "brown") %>%
  group_by(gender, method, age, region) %>% 
  summarise(sum(purchases)) %>% 
loaded <- as.data.frame(subset)
# do some join here but in memory and not in Redshift
# full_join(data, by=c("region", "age", "method"))

score 6 · Accepted Answer · answered May 14 '18 at 17:30

You can assign names in summarise. I don't have your data so I can't triple-check, but I've used this in my own code before when calling summarise(n()). Something like...

summarise(your_column_name = sum(purchases))

You can also pass it a column name with spaces, you just have to use backticks

summarise(`your column name` = sum(purchases))

Renaming a Summarised Column inside Redshift dplyr operations

1 Answers1