1

I am attempting to analyze some data in RStudio which originates from a MySQL database, so I used dbConnect to connect to said database, and copied the single table I needed for this project. I then used R to clean the data a bit, getting rid of some un-needed columns. So far, so good.

My problems arose when I realized my data had some outliers, and I needed to delete rows which contained obvious outlier data. This is something I have no problem doing in SQL, but lack the R experience to do effectively. So I looked into it, and found out about sqldf, a package which bills itself as a way to use SQL commands to manipulate data.frames. Perfect! But I'm having some trouble with this, as sqldf seems to require a database connection of some kind. Is there a way to simply connect to a data.frame I have in my global environment in RStudio?

Q: Couldn't you just manipulate the data in MySQL before importing it to R?
A: Yes, and that's what I'll do if I have to, but I'd like to understand sqldf better.

lwb
  • 101
  • 1
  • 10
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Don't the examples from `?sqldf` show the function working on local data.frames rather than db connections? What exactly did you try and what didn't work? – MrFlick Feb 07 '20 at 16:31
  • Where is the data you're working with? If its a df in R you should be able to use sqldf without worrying about connections. If you're trying to connect to mysql you will need to establish it. – Hatt Feb 07 '20 at 16:33
  • It's a df in R called "Book". I did the following: `sqldf("SELECT * FROM book")`, and got the following error: `Error in .local(drv, ...) : Failed to connect to database: Error: Can't connect to MySQL server on 'localhost' (0) Error in !dbPreExists : invalid argument type` – lwb Feb 07 '20 at 16:34
  • As far as I understand `sqldf`is a package to write SQL in R to directly manipulate dataframes. So if you have a dataframe loaded in Rstudio (regardless of where it comes from) you should be able to manipulate it. – novica Feb 07 '20 at 16:38
  • 1
    Try `sqldf("select * from book;", drv = 'SQLite')` – Hatt Feb 07 '20 at 16:40

1 Answers1

2

Try:

options(sqldf.driver = "SQLite")
sqldf("select * from book;", drv = 'SQLite')
Hatt
  • 689
  • 4
  • 9
  • 23
  • You don't need both `drv=` and `options`. (Also you don't need the semicolon.) One or the other is sufficient. The way it works is that if RMySQL is loaded it will assume that you want to use the MySQL backend unless you specify `options` OR `drv`. One alternative is to just make sure that RMySQL is not loaded in the first place. `detach("package:RMySQL")` can be used if it already is. Then you don't need to use `options` or `drv`. – G. Grothendieck Feb 07 '20 at 18:28