I use dplyr for getting data from sql database and it usually works pretty well.
But sometimes my code runs very slow and I'm not sure why. Most of the time I connect to tables with 1 milion rows, I filter some data and then I use collect function like this
flights <- tbl(
getDbConn("flight_data"),
in_schema(
"flights",
"usa")
) %>%
filter(destination == "east_coast") %>%
filter(city %in% c("NYC", "MIA") == F) %>%
filter(passanger_id %in% passangers$id) %>%
select(city, passanger_id, date) %>%
collect()
It takes only few seconds to load data from database. But sometimes my code is very slow and I don't know why. It might be related to kind of data is stored in database, like durrations. In that case it takes 10 minut to load data with same code as above.
Only thing that runs always fast is
flights <- tbl(
getDbConn("flight_data"),
in_schema(
"flights",
"other_table")
) %>%
select(city, passanger_id, date)
Code above runs few seconds. But once I use collect with this data it is slow - 10 minutes or more. I'm curious is there any fast alternative for collect? I tried tbl_df but it is similiary slow.