0

How can I insert a complete R data.frame in one go into a SQL table, with the help of a stored procedure but without using a loop, using R language?

I am using Microsoft Server 2012.

This is my R code:

library(RODBC)
dbhandle <- odbcDriverConnect('driver={SQL Server};server=MySERVERName;database=Testing;trusted_connection=true;')
f= data.frame(
    age= c(21,22,23,24),
  name= c("fifi", "jojo", "jj", "arbi"), stringsAsFactors = FALSE)
sqlQuery(dbhandle,paste("EXEC BOinsert @age=", f$age, ", @name=", f$name)) 

This is my SP:

ALTER proc [dbo].[BOinsert]

@age int,
@name nvarchar(50)
as 
Insert into Student(Age, Name) values (@age, @name)
R Yoda
  • 8,358
  • 2
  • 50
  • 87
Iqra.
  • 685
  • 1
  • 7
  • 18
  • How many rows will your `data.frame` contain typically? Inserting ten thousands of rows or more this way will be very slow (search for "bulk insert"). – R Yoda Feb 21 '17 at 06:48
  • Do you have to use a stored procedure to insert your data? – R Yoda Feb 21 '17 at 06:48
  • @RYoda yes using SP is compulsory, and I have shared my insert SP too. – Iqra. Feb 21 '17 at 07:36
  • Yes I have 21- 400+ records in that dataframe which I need to insert into SQL table using SP. – Iqra. Feb 21 '17 at 07:38
  • Which database are you using? Microsoft SQL Server version xxxx (as your connection string indicates)? – R Yoda Feb 21 '17 at 17:19
  • I am using SQL server 2012 and I don't want to use loop as I have said I have multiple rows 400+ – Iqra. Feb 22 '17 at 04:40
  • Are you allowed to create your own tables and stored procedures? Does the answer have to use RODBC or are you free to choose the driver stack (e. g. JDBC with Java)? – R Yoda Feb 22 '17 at 06:54

4 Answers4

1

I would recommend using RODBCext.

library(RODBCext)

f= data.frame(
  age= c(21,22,23,24),
  name= c("fifi", "jojo", "jj", "arbi"), stringsAsFactors = FALSE)

sqlExecute(dbhandle,
           "EXEC BOinsert @age = ?, @name = ?",
           data = f)

Strictly speaking, this is running a for loop over your data and executing the stored procedure once for every row. But I think it is done in C code, and you don't have to worry about making it work. It's one line of code for you. It will also deal with all of the quoting for you so that you don't have to worry about SQL injection attacks.

EDIT:

Just ran a test on my system with a data frame of 1,000 rows. sqlExecute was able to load it all in about 1.52 seconds. It took sqlQuery about 1.76 seconds. So not a huge improvement performance, but at least sqlExecute doesn't look like a loop.

Benjamin
  • 16,897
  • 6
  • 45
  • 65
  • Thanks for the performance indication! `RODBCext` uses a loop internally so the performance gain is "only" due to the prepared statement which avoid reparsing the SQL code. See the source code at line 149: https://github.com/zozlak/RODBCext/blob/master/R/sqlExecute.R -> `for(row in seq_len(nrow(data))){ ...` – R Yoda Feb 24 '17 at 17:38
  • With that being the case, the only other thing I can imagine that might potentially improve performance would be generating a script to write everything in one `INSERT`. Doing this is easy in R, but wouldn't be the stored procedure that us required. Writing a stored procedure to parse the vectors is possible, but not trivial. I'd have to believe there would be a very significan't performance boost before I would try it. – Benjamin Feb 25 '17 at 02:31
  • Is there any other version of RODBCext package that would work for R version 3.6.3? – Ami Apr 11 '20 at 11:14
0

It can be done by MySql using RMySQL

install.packages("RMySQL")
library(RMySQL)
mydb = dbConnect(MySQL(), user='your_user_name', password='your_password', dbname='your_database_name', host='localhost') 

dbListTables(mydb)


#create data frame

ds <- data.frame(a=1:10, b= 21:30)

#adding ds data frame to your database
dbWriteTable(mydb, name='db', value=ds)

#then check your database this will be added as table in name of db
Sarath_Mj
  • 323
  • 1
  • 2
  • 14
  • The OP seems to use a Microsoft SQL server (see connection string) and has to use stored procedure. So I guess your answer is good but not what the OP needs. – R Yoda Feb 21 '17 at 17:21
  • @Mj_7 I am getting many errors after multiples trials too. Loading required package: DBI > mydb = dbConnect(MySQL(), dbname='Testing', host='DGS') Error in .local(drv, ...) : Failed to connect to database: Error: Can't connect to MySQL server on 'DGS' (0) > > dbListTables(mydb) Error in dbListTables(mydb) : object 'mydb' not found > > > > – Iqra. Feb 22 '17 at 04:47
  • @Eqra some mistakes in MySQL() in your code : try with MySQL()and insert user name and password to connect your db with your new connection – Sarath_Mj Feb 22 '17 at 05:11
0

Added 02/22/2017:

There is no easy way avoid the loop if you cannot change the stored procedure ("SP") which can only insert one row per call. This means a network roundtrip per stored procedure call (= per data.frame row).

Your alternatives are (all too complicated to show it here in detail):

  1. Bulk insert the data.frame into a temporary or staging table using bcp.exe or the bulk insert statement. Then execute a SQL statement the loops over all rows in the temp/staging table and calls your insert SP. This effectively shifts the loop from the client side (R) to the server side (SQL server) which is much faster since no network roundtrips are required per row to be inserted. For details see: MS-SQL Bulk Insert with RODBC

  2. Still loop over all rows in the R code but create a block of SQL statements that insert many rows at once (e. g. 100 at time). This reduced the network roundtrips by the number of rows you send at once.

  3. Create a wrapper SP that receives the data.frame content as e. g. CSV text string, parses it and calls your insert SP per data row.

  4. [Added 02/23/2017] You could use the package RODBCext which supports prepared statements. This "only" hides the loop but still loops internally at the client side (with one network/server roundtrip per row insert) but saves some time since the SQL statement must not be parsed and "prepared" again (execution planning etc.).

Solutions 2 is the easiest one to implement, but you have to care about the maximum statement length of the ODBC driver (about 1000, 2000, 4000 or 8000 character I guess).

Original answer:

Since you have to use stored procedure to insert each single row you end up with a slow looped solution (which is OK if you do not have to many rows to insert):

library(RODBC)

f <- data.frame(age = c(21,22,23,24), name = c("fifi", "jojo", "jj", "arbi"), stringsAsFactors = FALSE)

dbhandle <- odbcDriverConnect('driver={SQL Server};server=MySERVERName;database=Testing;trusted_connection=true;')

for (i in 1:nrow(f)) {
  sqlQuery(dbhandle, paste0("EXEC BOinsert @age=", f[i,]$age, ", @name='", f[i,]$name, "'"))
  # print(paste0("EXEC BOinsert @age=", f[i,]$age, ", @name='", f[i,]$name, "'"))
}

odbcClose(dbhandle)
Community
  • 1
  • 1
R Yoda
  • 8,358
  • 2
  • 50
  • 87
  • I have 400+ rows to insert, currently i am using loop which is what i dont want to go with. – Iqra. Feb 22 '17 at 04:41
  • @Eqra Sorry, I forgot the point with the loop cause I think for 400 rows it is not performance critical. What do you mean wit "400+"? More than 400 rows? Or more than 400 thousand? – R Yoda Feb 22 '17 at 06:51
  • It is performing damn critical, the count can vary it may lead upto 1000. my manager asking to do it without loop. – Iqra. Feb 22 '17 at 07:46
0

Alternative 2 of my first answer (create a block of SQL statements that insert many rows at once) could be implemented like this:

library(RODBC)

f <- data.frame(age = c(21,22,23,24), name = c("fifi", "jojo", "jj", "arbi"), stringsAsFactors = FALSE)

dbhandle <- odbcDriverConnect('driver={SQL Server};server=MySERVERName;database=Testing;trusted_connection=true;')

block.size.in.rows <- 2  # you should use a much bigger value like 100 (2 is only for demonstration purposes here)
sql <- ''

for (i in 1:nrow(f)) {
  # sqlQuery(dbhandle, paste0("EXEC BOinsert @age=", f[i,]$age, ", @name='", f[i,]$name, "'"))
  sql <- paste0(sql, "EXEC BOinsert @age=", f[i,]$age, ", @name='", f[i,]$name, "'", "; \n")
  if ((i %% block.size.in.rows) == 0) {
    cat(paste0("Sending block:\n"))
    cat(sql)      # debug
    sqlQuery(dbhandle, sql)
    sql <- ''
  }
}

if (nchar(sql) > 0) {
  cat(paste0("Sending block:\n"))
  cat(sql)      # debug
  sqlQuery(dbhandle, sql)
}

odbcClose(dbhandle)

It will send multiple SQL statements in a block via one server roundtrip:

Sending block:
EXEC BOinsert @age=21, @name='fifi'; 
EXEC BOinsert @age=22, @name='jojo'; 
Sending block:
EXEC BOinsert @age=23, @name='jj'; 
EXEC BOinsert @age=24, @name='arbi'; 
R Yoda
  • 8,358
  • 2
  • 50
  • 87