1

My .NET application receives a stream of data, which I need to dump into a few tables in an Oracle database. I have an internal queue which queues the data and a few threads reading from the queue converting it into the corresponding insert statements. I am receiving about maybe 10 data items per second, but there can be bursts of over 100 data items per second.

My first approach was to take each data item, convert it into the corresponding insert statements, and execute it one by one. However, this turned out to be too slow because each insert required a round trip to the database.

My next approach was to batch the inserts together into groups of up to 50, depending on how many items were in the queue, then wrap them into a begin-end block, and then push it into the database, like this:

begin
  insert into MyTable (col1, col2, col3...) values (123, 'data1', 'data2', ...);
  insert into MyTable (col1, col2, col3...) values (456, 'dataX', 'dataY', ...);
  insert into MyTable (col1, col2, col3...) values (789, 'dataA', 'dataB', ...);
  -- variable number inserts...
end;

This improved performance significantly and I was happy. Then our Oracle guy came to me and told me I was killing his database because I was sending mass amounts of unique SQL statements, which Oracle had to parse and cache each time. Eventually, Oracle crashed. The suggestion was to use always use the same SQL statements with bind variables so that it won't have to be parsed each time.

However, that would bring me back to the problem I first had, which is, I would have to run each insert statement one at a time, with bind variables, so that the statements are the same.

insert into MyTable (col1, col2, col3...) values (:val1, :val2, :val3, ...);

I can try to combine multiple inserts into a begin-end block, but that would result in the problem that SQL statements are all unique.

begin
  insert into MyTable (col1, col2, col3...) values (:val11, :val12, :val13, ...);
  insert into MyTable (col1, col2, col3...) values (:val21, :val22, :val23, ...);
  insert into MyTable (col1, col2, col3...) values (:val31, :val32, :val33, ...);
  ...
end;

What should I be doing? Inserting statements one by one with bind variables, but with a huge number of threads? Should I be writing them to a CSV file and using SQL Loader? But how would I deal with CLOB columns? Should the inserts be wrapped in a stored procedure instead, and then use my batching approach?

I feel that this must be a very common problem, and there must be some sort of best practices for this situation.

Mas
  • 4,546
  • 5
  • 39
  • 56
  • In SQL you can have multiple values (1,2), (2,3) ... This reduces the number of unique statements and the number of locks. There is a limit of 1000 and a sweet spot. Suspect the same with Oracle. There is also TVP and Drapper. But my experience is with SQL so only a comment. – paparazzo Dec 12 '13 at 17:35
  • 2
    Not my area at all but here's a page about making bulk inserts more efficient in ODP.Net using array binding: http://dotnetslackers.com/articles/ado_net/BulkOperationsUsingOracleDataProviderForNETODPNET.aspx – Morbo Dec 12 '13 at 19:11

4 Answers4

2

What I was looking for is called Array Binding, as described here for ODP.NET and here for Devart. Thanks to Morbo for pointing me in the right direction.

Basically, you can bind an array of values to a bind variable, and execute multiple inserts in one call.

For example, for the following insert statement

insert into MyTable (col1, col2, col3) values (:val1, :val2, :val3)

You can insert multiple rows with array binding. The following example will insert 3 rows into the database with a single round-trip. (Note that this code is for the Devart Oracle connector. ODP.NET code would look a little different).

var command = connection.CreateCommand();
command.CommandText = "insert into MyTable (col1, col2, col3) values (:val1, :val2, :val3)"
command.Parameters.Add("val1", OracleDbType.Number, 10);
command.Parameters.Add("val2", OracleDbType.VarChar, 4000);
command.Parameters.Add("val3", OracleDbType.VarChar, 4000);
command.Parameters["val1"].Value = new int[] { 1, 2, 3 };
command.Parameters["val2"].Value = new int[] { "a", "b", "c" };
command.Parameters["val3"].Value = new int[] { "x, "y", "z" };
command.ExecuteArray(3);

I did some quick performance measurements using the Devart connector. Results may be different with Oracle Client.

  • With CLOB data, there seems to be not much of an improvement compared to inserting one-by-one. Strangely, when trying to insert more than 200 rows, array binding was slower. This could be a quirk with the Devart connector.
  • With no CLOB data, there was a 4x to 8x improvement compared to inserting one-by-one.

Performance Without CLOB Data

Performance With CLOB Data

Mas
  • 4,546
  • 5
  • 39
  • 56
1

I am PL/SQL developer only, so I cannot tell you many things about .net and can help only with PLSQL.

What happens if you try to solve this problem with always 50 inserts? Or another suitable amount? In this case you would also have this block:

begin
  insert into MyTable (col1, col2, col3...) values (:val11, :val12, :val13, ...);
  insert into MyTable (col1, col2, col3...) values (:val21, :val22, :val23, ...);
  insert into MyTable (col1, col2, col3...) values (:val31, :val32, :val33, ...);
  -- + 47 more inserts...

exception
  when <some exception> then
    commit;  -- also you can write the log here about the error
end;

For exception you can use duplicate index, not null constraint, whatever you like!

E.g. you've got 37 rows. You bind all of them and now bind 13 more with nulls only and for col1 you bind some string, e.g. 'a'. On 38th row Oracle raises an exception (in this case invalid_number), which commits the result.

In this case you have only to parse this once: at the first run of the script. Every further run will be cached because SQL text is not changed.

You can enhance this script for forall syntax to get better performance as well.

smnbbrv
  • 23,502
  • 9
  • 78
  • 109
1

I will recommend something like this:

DECLARE
   /*If you'll be populating all the columns of target table*/
   TYPE t_table_col IS TABLE OF table1%rowtype;

   /*OR If you'll be populating just some columns of the table */
   TYPE r_table  IS RECORD (
      column1 table1.column1%type,
      column2 table1.column2%type
   );
   TYPE t_table_col IS TABLE OF r_table;


   co_table_col t_table_col;
BEGIN
   /* I don't know the way you're getting the input values. So this is just an example */
   FOR i IN 1..inputValue.count LOOP
      co_table_col(i).column1 := inputValue(i).column1;
      co_table_col(i).column2 := inputValue(i).column2;
   END LOOP;

   /* Then if your structure is the same as the table one */
   FORALL i IN indices of co_table_col
      INSERT INTO table1 VALUES co_table_col(i);

   COMMIT;


   /* OR if you are populating just some columns (using a different structure) */
   FORALL i IN indices of co_table_col
      INSERT INTO table1(column1, column2) VALUES (co_table_col(i).column1, co_table_col(i).column2);

   COMMIT;

END;

This brings you huge advantages, because your code would be more readable, and the use of FORALL, let you make faster inserts.

Because you'll be controlling the quantity of rows to insert, maybe won't have problems with undo space, or something like that. With FORALL, you can save exceptions, so you can know if some insert fail, etc.

Omar Centi
  • 74
  • 4
  • This is the way to go. Take a look at this link that has some good information on this approach. http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html From the link: "On my laptop running Oracle Database 11g Release 2, it took 4.94 seconds to insert 100,000 rows, one at a time. With FORALL, those 100,000 were inserted in 0.12 seconds." – RMAN Express Dec 12 '13 at 18:42
  • lol, have you read the question at all? The problem is (by author) exactly in passing parameters/SQL to database and not in fast inserting. – smnbbrv Dec 13 '13 at 09:23
  • The questions is: Efficiently inserting variable number of rows into Oracle. Omar's solution is the most efficient solution. – RMAN Express Dec 13 '13 at 15:27
  • Thanks for the answer. The title to my question was not clear enough. My problem is specifically with .NET applications. – Mas Dec 17 '13 at 13:19
1

There is probably some sort of .Net batching, similar to JDBC batching.

If not, a small adjustment to the code will help:

insert into MyTable(col1, col2, col3...)
select :val11, :val12, :val13, ... from dual union all
select :val21, :val22, :val23, ... from dual union all
...
select :val31, :val32, :val33, ... from dual

Now there is only one SQL statement to parse. Be careful not to batch too many rows. More than a few hundred rows can lead to slow parsing performance.

Community
  • 1
  • 1
Jon Heller
  • 34,999
  • 6
  • 74
  • 132
  • this does not make any advantage: the query still will not be cached – smnbbrv Dec 13 '13 at 09:24
  • @Simon It is true the query will not be cached, but there is only one query instead of 50. It may parse slightly longer than your query, but it also has 49 less context switches. Another option is to combine our two answers - a single query that always has 50 `select ... from dual`. And maybe an inline view that filters out rows where there's a magic value that indicates the row is not a real one. – Jon Heller Dec 13 '13 at 14:03
  • Yep, a combination is possible, of course, but the main problem here, referring to the author, is caching. – smnbbrv Dec 13 '13 at 14:14
  • @Simon Yes, and this solution will reduce the amount of SQL that is cached (parsed) by about 98%. I haven't ran a test, but I'm willing to bet that in general 1 hard-parsed statement runs better than 50 soft-parsed (or cached) statements. – Jon Heller Dec 14 '13 at 00:40
  • this is not about the statement, but about the whole piece of code – smnbbrv Dec 14 '13 at 08:18
  • @Simon What I'm advocating is that the entire piece of code should be a single SQL statement. (Now that I think about it, there's not even a need for a `begin` and `end` block.) I'm not sure if this is the best way to do it, but I've used this trick many times to significantly improve performance. – Jon Heller Dec 14 '13 at 18:02