24

Which would be a better option for bulk insert into an Oracle database ? A FOR Cursor loop like

DECLARE
   CURSOR C1 IS SELECT * FROM FOO;
BEGIN
   FOR C1_REC IN C1 LOOP
   INSERT INTO BAR(A,
                B,
                C)
          VALUES(C1.A,
                 C1.B,
                 C1.C);
   END LOOP;
END

or a simple select, like:

INSERT INTO BAR(A,
                B,
                C)
        (SELECT A,
                B,
                C
        FROM FOO);

Any specific reason either one would be better ?

blong
  • 2,815
  • 8
  • 44
  • 110
Sathyajith Bhat
  • 21,321
  • 22
  • 95
  • 134
  • 5
    FYI, you shouldn't refer to this as a 'bulk insert'. While you may be inserting large quantities of data, there's a PL/SQL feature called Bulk Operations (Bulk Inserts, Bulk Collect) etc. – moleboy Jun 12 '09 at 19:04
  • And to add to the bulk operations reference, you should consider that above the other two for large data operations or ongoing, regular insert operations. Bulk Inserts can yield tremendous performance improvements over simple inserts in the examples given. – DCookie Jun 12 '09 at 22:56

8 Answers8

31

I would recommend the Select option because cursors take longer.
Also using the Select is much easier to understand for anyone who has to modify your query

Josh Mein
  • 28,107
  • 15
  • 76
  • 87
24

The general rule-of-thumb is, if you can do it using a single SQL statement instead of using PL/SQL, you should. It will usually be more efficient.

However, if you need to add more procedural logic (for some reason), you might need to use PL/SQL, but you should use bulk operations instead of row-by-row processing. (Note: in Oracle 10g and later, your FOR loop will automatically use BULK COLLECT to fetch 100 rows at a time; however your insert statement will still be done row-by-row).

e.g.

DECLARE
   TYPE tA IS TABLE OF FOO.A%TYPE INDEX BY PLS_INTEGER;
   TYPE tB IS TABLE OF FOO.B%TYPE INDEX BY PLS_INTEGER;
   TYPE tC IS TABLE OF FOO.C%TYPE INDEX BY PLS_INTEGER;
   rA tA;
   rB tB;
   rC tC;
BEGIN
   SELECT * BULK COLLECT INTO rA, rB, rC FROM FOO;
   -- (do some procedural logic on the data?)
   FORALL i IN rA.FIRST..rA.LAST
      INSERT INTO BAR(A,
                      B,
                      C)
      VALUES(rA(i),
             rB(i),
             rC(i));
END;

The above has the benefit of minimising context switches between SQL and PL/SQL. Oracle 11g also has better support for tables of records so that you don't have to have a separate PL/SQL table for each column.

Also, if the volume of data is very great, it is possible to change the code to process the data in batches.

Vadzim
  • 24,954
  • 11
  • 143
  • 151
Jeffrey Kemp
  • 59,135
  • 14
  • 106
  • 158
5

If your rollback segment/undo segment can accomodate the size of the transaction then option 2 is better. Option 1 is useful if you do not have the rollback capacity needed and have to break the large insert into smaller commits so you don't get rollback/undo segment too small errors.

MichaelN
  • 1,734
  • 11
  • 6
5

A simple insert/select like your 2nd option is far preferable. For each insert in the 1st option you require a context switch from pl/sql to sql. Run each with trace/tkprof and examine the results.

If, as Michael mentions, your rollback cannot handle the statement then have your dba give you more. Disk is cheap, while partial results that come from inserting your data in multiple passes is potentially quite expensive. (There is almost no undo associated with an insert.)

Scott Swank
  • 664
  • 1
  • 6
  • 7
3

I think that in this question is missing one important information.

How many records will you insert?

  1. If from 1 to cca. 10.000 then you should use SQL statement (Like they said it is easy to understand and it is easy to write).
  2. If from cca. 10.000 to cca. 100.000 then you should use cursor, but you should add logic to commit on every 10.000 records.
  3. If from cca. 100.000 to millions then you should use bulk collect for better performance.
sulica
  • 111
  • 9
2

As you can see by reading the other answers, there are a lot of options available. If you are just doing < 10k rows you should go with the second option.

In short, for approx > 10k all the way to say a <100k. It is kind of a gray area. A lot of old geezers will bark at big rollback segments. But honestly hardware and software have made amazing progress to where you may be able to get away with option 2 for a lot of records if you only run the code a few times. Otherwise you should probably commit every 1k-10k or so rows. Here is a snippet that I use. I like it because it is short and I don't have to declare a cursor. Plus it has the benefits of bulk collect and forall.

begin
    for r in (select rownum rn, t.* from foo t) loop
        insert into bar (A,B,C) values (r.A,r.B,r.C);
        if mod(rn,1000)=0 then
            commit;
        end if;
    end;
    commit;
end;

I found this link from the oracle site that illustrates the options in more detail.

Arturo Hernandez
  • 2,749
  • 3
  • 28
  • 36
0

You can use:

Bulk collect along with FOR ALL that is called Bulk binding.

Because PL/SQL forall operator speeds 30x faster for simple table inserts.

BULK_COLLECT and Oracle FORALL together these two features are known as Bulk Binding. Bulk Binds are a PL/SQL technique where, instead of multiple individual SELECT, INSERT, UPDATE or DELETE statements are executed to retrieve from, or store data in, at table, all of the operations are carried out at once, in bulk. This avoids the context-switching you get when the PL/SQL engine has to pass over to the SQL engine, then back to the PL/SQL engine, and so on, when you individually access rows one at a time. To do bulk binds with INSERT, UPDATE, and DELETE statements, you enclose the SQL statement within a PL/SQL FORALL statement. To do bulk binds with SELECT statements, you include the BULK COLLECT clause in the SELECT statement instead of using INTO.

It improves performance.

Eray Balkanli
  • 7,752
  • 11
  • 48
  • 82
user2001117
  • 3,727
  • 1
  • 18
  • 18
0

I do neither for a daily complete reload of data. For example say I am loading my Denver site. There are other strategies for near real time deltas.

I use a create table SQL as I have found is just almost as fast as a bulk load For example, below a create table statement is used to stage the data, casting the columns to the correct data type needed:

CREATE TABLE sales_dataTemp as select cast (column1 as Date) as SALES_QUARTER, cast (sales as number) as SALES_IN_MILLIONS, .... FROM TABLE1;

this temporary table mirrors my target table's structure exactly which is list partitioned by site. I then do a partition swap with the DENVER partition and I have a new data set.

Hughsmg
  • 1
  • 1