8

Is it possible to perform a bulk insert into an MS-SQL Server (2000, 2005, 2008) using the RODBC package?

I know that I can do this using freebcp, but I'm curious if the RODBC package implements this portion of the Microsoft SQL API and if not, how difficult it would be to implement it.

Sam
  • 7,252
  • 16
  • 46
  • 65
ephpostfacto
  • 303
  • 1
  • 2
  • 7

6 Answers6

6

check out the new odbc and DBI packages. DBI::dbWriteTable writes around 20,000 records per second... Much much faster than the Row Inserts from RODBC::sqlSave()

quickreaction
  • 675
  • 5
  • 17
2

You're probably looking for ?sqlSave which uses a parametrized INSERT INTO query (taking place in one operation) when you set Fast=True.

DMK
  • 2,448
  • 1
  • 24
  • 35
Tyler
  • 626
  • 4
  • 13
  • 3
    Nah, sqlSave will do multiple INSERTS. I want a BULK INSERT which is a single transaction. – ephpostfacto Nov 06 '09 at 21:42
  • does fast=true not do it as a single transaction? – Tyler Nov 09 '09 at 23:40
  • 1
    from the rodbc docs: "logical. If false, write data a row at a time. If true, use a parametrized INSERT INTO or UPDATE query to write all the data in one operation."..however it doesn't seem to make any difference (when writing to Netezza in my case) – jpmorris Oct 26 '16 at 20:27
  • I am afraid in my experience setting fast = T or fast = F doesn't change a thing. Records are inserted one by one no matter what. At least in my case with SQL Server 2012. – Matteo Castagna Mar 22 '19 at 15:23
  • For me, fast = TRUE, makes the sqlSave function ~ 3 times faster. Also on sql server 2012. – LeslieKish Apr 16 '19 at 21:30
2

Now You can use dbBulkCopy from the new rsqlserver package:

A typical scenario:

  1. You create a matrix
  2. you save it as a csv file
  3. You call dbBulkCopy to read fil and insert it using internally bcp tool of MS Sql server.

This assume that your table is already created in the data base:

dat <- matrix(round(rnorm(nrow*ncol),nrow,ncol)
id.file = "temp_file.csv"                      
write.csv(dat,file=id.file,row.names=FALSE)
dbBulkCopy(conn,'NEW_BP_TABLE',value=id.file)
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • any reason why rsqlserver is not on cran? – jangorecki Feb 06 '14 at 16:56
  • 1
    @MusX because it is under development(specially documentation and tests part) and it uses `rClr` package which is not on CRAN too. But you are encouraged to use it from GITHUB and will be happy with any feedback. – agstudy Feb 06 '14 at 17:00
2

Using RODBC, the fastest insert we've been able to create (260 million row insert) looks like the following (in R pseudo code):

ourDataFrame <- sqlQuery(OurConnection, "SELECT myDataThing1, myDataThing2
                                         FROM myData")
ourDF <- doStuff(ourDataFrame)
write.csv(ourDF,ourFile)  
sqlQuery(OurConnection, "CREATE TABLE myTable ( la [La], laLa [LaLa]);
                         BULK INSERT myTable FROM 'ourFile' 
                              WITH YOURPARAMS=yourParams;")

If you're running this from between servers, you need a network drive that the R server can write to (e.g. one server with permissions for writing to the DB uses Rscript to productionalize the code), and the SQL Server can read from.

quickreaction
  • 675
  • 5
  • 17
1

From everything I can find, there is NO solution for bulk insert to MySQL and nothing that works with SSIS which is why Microsoft is including in-database analytics with SQL Server 2016 after buying Revolution R Analytics.

I tried to comment on the previous answer but don't have the reputation to do it.

The rsqlserver package needs to run with rClr and neither of those packages are well-behaved, especially because rsqlserver's INSERT functions have poor data type handling. So if you use it, you'll have no idea what you're looking at in the SQL table as much of the information in your data.frame will have been transformed.

Considering the RODBC package has been around for 15 years, I'm pretty disappointed that no one has created a bulk insert function...

quickreaction
  • 675
  • 5
  • 17
  • 1
    important point on rsqlserver, but for many of us we don't need to 'look at' the data (from an R standpoint). If it's been modeled and shaped and processed in R we just need the result back in the database and it doesnt matter what R transforms the type to in the database (as long as they are reasonable and can be read by other systems) – jpmorris Oct 26 '16 at 20:02
  • @Joe I don't think I'd ever hire someone who says "we don't need to 'look at' the data". – quickreaction Jul 17 '19 at 23:12
1

Our n2khelper package can use bcp (bulkcopy) when it is available. When not available it falls back to multiple INSERT statements.

You can find the package on https://github.com/INBO-Natura2000/n2khelper

Install it with devtools::install_git("INBO-Natura2000/n2khelper") and look for the odbc_insert() function.

Thierry
  • 18,049
  • 5
  • 48
  • 66