I'm trying to analyze data stored in an SQL database (MS SQL server)
in R
, and on a mac. Typical queries might return a few GB of data, and the entire database is a few TB. So far, I've been using the R package odbc
, and it seems to work pretty well.
However, dbFetch()
seems really slow. For example, a somewhat complex query returns all results in ~6 minutes in SQL server, but if I run it with odbc
and then try dbFetch
, it takes close to an hour to get the full 4 GB into a data.frame
. I've tried fetching in chunks, which helps modestly: https://stackoverflow.com/a/59220710/8400969. I'm wondering if there is another way to more quickly pipe the data to my mac, and I like the line of thinking here: Quickly reading very large tables as dataframes
What are some strategies for speeding up dbFetch
when the results of queries are a few GB of data? If the issue is generating a data.frame
object from larger tables, are there savings available by "fetching" in a different manner? Are there other packages that might help?
Thanks for your ideas and suggestions!