I am using dplyr to do certain operations in Redshift so I don't load the data in memory.
data <- tbl(conn, "customers") %>%
filter(age >= 18)
subset <- data %>%
filter(eye_color != "brown") %>%
group_by(gender, method, age, region) %>%
summarise(sum(purchases)) %>% # will create a column called sum(purchases)
full_join(data, by=c("region", "age", "method"))
Right now when I look at the resulting dataframe, I will see a column called sum(purchases)
and I want to rename it to purchases
which will create the columns, purchase.x
and purchase.y
after the merge.
Most of the renaming that I've read so far are dealing with dataframes that are in memory rather than dataframes that are lazily evaluated with dbplyr. I have tried using rename
, rename_
, rename_at
as well as different variations of select
. I have also tried strategies laid out here and here but no luck
Is there a way to rename the sum(purchases)
. The only other option I have is to load the dataframe in memory at a certain step
data <- tbl(conn, "customers") %>%
filter(age >= 18)
subset <- data %>%
filter(eye_color != "brown") %>%
group_by(gender, method, age, region) %>%
summarise(sum(purchases)) %>%
loaded <- as.data.frame(subset)
# do some join here but in memory and not in Redshift
# full_join(data, by=c("region", "age", "method"))