1

I'm working on a process improvement that will use SQL in r to work with large datasets. Currently the source data is stored in several different MS Access databases. My initial approach was to use RODBC to read all of the source data into r, and then use sqldf() to summarize the data as needed. I'm running out of RAM before I can even begin use sqldf() though.

Is there a more efficient way for me to complete this task using r? I've been looking for a way to run a SQL query that joins the separate databases before reading them into r, but so far I haven't found any packages that support this functionality.

  • `dbplyr` allows you do perform several operations on the data connections/data bases well before you collect output in R. – Ryan Morton Jul 09 '18 at 20:00

2 Answers2

1

Should your data be in a database dplyr (a part of the tidyverse) would be the tool you are looking for.

You can use it to connect to a local / remote database, push your joins / filters / whatever there and collect() the result as a data frame. You will find the process neatly summarized on http://db.rstudio.com/dplyr/

What I am not quite certain of - but it is not a R issue but rather an MS Access issue - is the means for accessing data across multiple MS Access databases.

You may need to write custom SQL code for that & pass it to one of the databases via DBI::dbGetQuery() and have MS Access handle the database link.

Jindra Lacko
  • 7,814
  • 3
  • 22
  • 44
  • Interesting. I might be able to create a single Access database that links to all of the relevant tables in the other databases, so that could be a temporary workaround. I found a similar question at the following link which I may be able to leverage as well - https://stackoverflow.com/questions/50327140/joining-across-databases-with-dbplyr – overflowStack345 Jul 09 '18 at 21:23
  • If you are able to make a single view in MS Access containing all your data then it is a cut-and-dried `dplyr` case – Jindra Lacko Jul 09 '18 at 21:27
1

The link you posted looks promising. If it doesn't yield the intended results, consider linking one Access DB to all the others. Links take almost no memory. Union the links and fetch the data from there.

# Load RODBC package
library(RODBC)

# Connect to Access db
channel <- odbcConnectAccess("C:/Documents/Name_Of_My_Access_Database")

# Get data
data <- sqlQuery(channel , paste ("select * from Name_of_table_in_my_database"))

These URLs may help as well.

https://www.r-bloggers.com/getting-access-data-into-r/

How to connect R with Access database in 64-bit Window?

ASH
  • 20,759
  • 19
  • 87
  • 200