In Oracle, appending millions of records with INSERT...VALUES
is suboptimal compared to a single insert select as well-known blogs stress:
dba-oracle.com:
Don't use standard SQL inserts - They are far slower than other approaches.
asktom.oracle.com:
The fastest would be to disable the indexes (mark them unusable) and do this in a SINGLE insert
stackoverflow.com @Lokesh:
A select insert is the fastest approach as everything stays in RAM.
Therefore, consider building a staging, temp table with RODBC's sqlSave or DBI's dbWriteTable. And to help align variable types. See this SO answer by @Linda.
RODBC::sqlSave(Target1Conn, InputData, "myTempTable", fast=TRUE, append=FALSE, rownames=FALSE)
DBI::dbWriteTable(Target1Conn, InputData, "myTempTable", append=FALSE, row.names=FALSE)
Then, run a single insert select (and try with Oracle's APPEND hint as shown below):
sql <- "INSERT /*+ append */ INTO myFinalTable (col1, col2, col3, col4, col5,
col6, col7, col8, col9, col10,
col11, col12, col13, col14, col15)
SELECT col1, col2, col3, col4, col5,
col6, col7, col8, col9, col10,
col11, col12, col13, col14, col15
FROM myTempTable"
RODBC::sqlQuery(Target1Conn, sql)
DBI::dbExecute(Target1Conn, sql)
Further optimization techniques. Be sure to discuss with your DBA before proceeding.
Used BULK COLLECT
(if available) inside a stored procedure still using above temp table.
Oracle (run only once)
CREATE OR REPLACE PROCEDURE load_data
IS
TYPE TObjectTable IS TABLE OF ALL_OBJECTS%ROWTYPE;
ObjectTable$ TObjectTable;
BEGIN
SELECT * BULK COLLECT INTO ObjectTable$
FROM myTempTable;
FORALL x in ObjectTable$.First..ObjectTable$.Last
INSERT INTO myFinalTable VALUES ObjectTable$(x) ;
END;
R
RODBC::sqlQuery(Target1Conn, "EXEC load_data")
DBI::dbExecute(Target1Conn, "EXEC load_data")
Have R export a csv or other delimited text file
utils::write.csv(InputData, "/output/path/inputdata.csv")
# data.table::fwrite(InputData, "/output/path/inputdata.csv") # ALTERNATIVE
Then use Oracle's SQL*Loader
which is great and designed for bulk inserts and as advised by @Ben.
Save below as .ctl
text file such as myload.ctl
OPTIONS (SKIP=1)
LOAD DATA
INFILE "/output/path/inputdata.csv"
INTO TABLE "myschema"."myFinalTable"
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(col1 INTEGER,
col2 DATE 'YYYY-MM-DDD',
col3 VARCHAR(50),
col4 NUMBER(10,2),
col5 CHAR(255),
...)
Run sqlldr
command line (outputs errors and issues in same directory)
sqlldr CONTROL=myload.ctl, LOG=myload.log, BAD=myload.bad USERID=user/pwd
Of course, R can automatically create the above .ctl
with writeLines()
and call sqlldr
at command line with system()
.
Temporarily disable indexes, load new data using above temp table, and then re-enable indexes. See procedure here. This might be necessary also for SQL*LOADER
.