3

I am using the following code to upload a new table into a mysql database.

library(RMySql)
library(RODBC)

con <- dbConnect(MySQL(),
  user = 'user',
  password = 'pw',
  host = 'amazonaws.com',
  dbname = 'db_name')

dbSendQuery(con, "CREATE TABLE table_1 (
        var_1 VARCHAR(50),
        var_2 VARCHAR(50),
        var_3 DOUBLE,
        var_4 DOUBLE);
        ")

channel <- odbcConnect("db name")
sqlSave(channel, dat = df, tablename = "tb_name", rownames = FALSE, append = 
TRUE)

The full data set is 68 variables and 5 million rows. It is taking over 90 minutes to upload 50 thousand rows to MySql. Is there a more efficient way to upload the data to MySql. I originally tried dbWriteTable() but this would result in an error message saying the connection to the database was lost.

roarkz
  • 811
  • 10
  • 22
  • 1
    Yes, their corresponding command-line loaders are often recommended for MySQL, PostgreSQL, ... as the process bulk-updates better that the (possibly line-by-line) operating `sqlSave()` or `dbWriteTable()`. – Dirk Eddelbuettel May 31 '17 at 15:20

2 Answers2

4

Consider a CSV export from R for an import into MySQL with LOAD DATA INFILE:

...
write.csv(df, "/path/to/filename.csv", row.names=FALSE)

dbSendQuery(con, "LOAD DATA LOCAL INFILE '/path/to/filename.csv'
                  INTO TABLE mytable
                  FIELDS TERMINATED by ','
                  ENCLOSED BY '"'
                  LINES TERMINATED BY '\\n'")
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • When I use this method I get the error `Error in .local(conn, statement,...): could not run statement: Lost connection to MySQL server during query`. Any suggestions? – roarkz May 31 '17 at 21:22
  • Hmmm...put this query in an .sql script and run it via [MySQL cli](https://stackoverflow.com/questions/8940230/how-can-i-run-a-sql-text-file-on-a-mysql-database) (black box command line). You will call it with `source /path/to/filename.sql; ` If this runs successfully, R is the issue. – Parfait May 31 '17 at 22:14
  • So, I tried `SET GLOBAL local_infile = 1;` and I got an "Access Denied" error, so it appears the issue is my permissions. Thanks for all the help! – roarkz Jun 01 '17 at 01:46
  • Try granting your user such rights/permissions or having the db admin set it up. – Parfait Jun 01 '17 at 14:27
0

You could try to disable the mysql query log:

dbSendQuery(con, "SET GLOBAL general_log = 'off'")

I can't tell if your mysql user account has the appropriate permissions to do that, or if it conflicts with your business needs.

Off the top of my head: Otherwise you could try to send the data in say 1000-row batches, using a for- loop in your Rscript, and maybe option verbose = true in your call to sqlSave

If you send the data in a single batch, Mysql might try to run the INSERT as a single transaction ("all-or-nothing") and if it fails it goes into recovery or just fails after inserting some random number of rows.

knb
  • 9,138
  • 4
  • 58
  • 85