25

I have a PL/SQL procedure that does a lot of SUBSTRs on a VARCHAR2 parameter. I would like to remove the length limit, so I tried to change it to CLOB.

Works fine, but performance suffers, so I did some tests (based on these tests from 2005).


UPDATE: I can reproduce this on several different instances with different Oracle versions and different hardware, dbms_lob.substr is always noticeable slower than substr(CLOB), and a lot slower than SUBSTR(VARCHAR2).

Bob's results and the tests in the link above tell a different story.

Can anyone explain this, or at least reproduce either Bob's or my results? Thanks!


Test results:

+000000000 00:00:00.004000000 (VARCHAR2)
+000000000 00:00:00.298000000 (CLOB SUBSTR)
+000000000 00:00:00.356000000 (DBMS_LOB.SUBSTR)

Test code:

DECLARE
  l_text   VARCHAR2(30) := 'This is a test record';
  l_clob   CLOB := l_text;
  l_substr VARCHAR2(30);
  t TIMESTAMP;
BEGIN
  t := SYSTIMESTAMP;
  FOR i IN 1..100000 LOOP
    l_substr := SUBSTR(l_text,1,14);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (VARCHAR2)');

  t := SYSTIMESTAMP;
  FOR i IN 1..100000 LOOP
    l_substr := SUBSTR(l_clob,1,14);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (CLOB SUBSTR)');

  t := SYSTIMESTAMP;
  FOR i IN 1..100000 LOOP
    l_substr := DBMS_LOB.SUBSTR(l_clob,14,1);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR)');
END;
Peter Lang
  • 54,264
  • 27
  • 148
  • 161
  • Note: Test three is `14,1` where the others are `1,14`. I'd also test something like `10000, 5000` as the point is that you're looking to break the 4k limit of VARCHAR. Also, as the results are about 75x slower for non VARCHAR, are you able to look at an algorithm that deals with a multiple VARCHARs? *[Such as a normalised table where one field is the 'sequence_id' showing the relative position of this string, and the other is the VARCHAR]*. Finally, although there is large *relative* difference, the *absolute* difference is low. So, does it matter? *[Pre-optimisation]* – MatBailie Apr 26 '12 at 11:01
  • 1
    @Dems: Thanks for your input! The `14,1` and `1,14` is correct (thanks Oracle for the consistent APIs). I'm trying to break the `32767` Byte limit (PL/SQL, not SQL), and results are more or less the same when using text with that length `(LPAD('X', 32767, 'X'))`. I have thought of that multiple-varchar-table solution, but I'd like to avoid it :) And it does matter, since the procedure is called really often, but most of all I'm curious if there are alternatives... – Peter Lang Apr 27 '12 at 09:58
  • 1
    On my machines DBMS_LOB.SUBSTR is a little slower than CLOB_SUBSTR(20%). And both mooore slower than varchar2(70 times slower). I run on 11gR2. – Florin Ghita May 04 '12 at 13:31
  • @FlorinGhita: Thanks! Which OS are you using, in case this makes a difference? – Peter Lang May 09 '12 at 06:54
  • AIX 6(don't know the minor...2 or 3). It's a 128 cores IBM machine. what OS and machine do you have? – Florin Ghita May 09 '12 at 07:54

4 Answers4

17

(Lies, damn lies, and benchmarks...)

I re-ran your test 10 times, expanding the string so it was a full 30 characters long, and got the following averaged results:

+000000000 00:00:00.011694200 (VARCHAR2)
+000000000 00:00:00.901000600 (CLOB SUBSTR)
+000000000 00:00:00.013169200 (DBMS_LOB.SUBSTR)

I then changed the substring range to 5,14 (14,5 for DBMS_LOB.SUBSTR) and got:

+000000000 00:00:00.011731000 (VARCHAR2)
+000000000 00:00:01.010840000 (CLOB SUBSTR)
+000000000 00:00:00.011427000 (DBMS_LOB.SUBSTR)

I then changed the range to 17,14 (14,17 for DBMS_LOB.SUBSTR) and got

+000000000 00:00:00.013578900 (VARCHAR2)
+000000000 00:00:00.964527400 (CLOB SUBSTR)
+000000000 00:00:00.011416800 (DBMS_LOB.SUBSTR)

Finally, I changed the range to 25,14 (14,25 for DBMS_LOB.SUBSTR) and got

+000000000 00:00:00.011210200 (VARCHAR2)
+000000000 00:00:00.916439800 (CLOB SUBSTR)
+000000000 00:00:00.013781300 (DBMS_LOB.SUBSTR)

My conclusion is that when working against CLOB's it's best to use DBMS_LOB.SUBSTR as it appears to have effectively no performance penalty compared to using SUBSTR against a "normal" VARCHAR2. SUBSTR against a CLOB seems to suffer from a significant performance penalty. For the record - OS = HP/UX (Unix variant), Oracle version=11.1, processor=HP Itanium 2-plex. YMMV.

Share and enjoy.


And because if it's worth doing it's worth over-doing, here's some more results with the strings expanded to 32767 characters. Substring ranges given with each set of results:

1, 25000
+000000000 00:00:00.198466400 (VARCHAR2)
+000000000 00:00:02.870958700 (CLOB SUBSTR)
+000000000 00:00:00.174490100 (DBMS_LOB.SUBSTR)

1000, 25000
+000000000 00:00:00.253447900 (VARCHAR2)
+000000000 00:00:02.491790500 (CLOB SUBSTR)
+000000000 00:00:00.193560100 (DBMS_LOB.SUBSTR)

10000, 25000
+000000000 00:00:00.217812000 (VARCHAR2)
+000000000 00:00:02.268794800 (CLOB SUBSTR)
+000000000 00:00:00.222200200 (DBMS_LOB.SUBSTR)

Same day, same conclusion.

Cthulhu fhtagn.


(Once more unto the breach, dear friends, once more...)

Re-ran the benchmarks, changing the size of the CLOB to 3276700, and taking the substring from the middle starting at 2475000 for length 25000 I get:

+000000000 00:00:00.176883200 (VARCHAR2)
+000000000 00:00:02.069482600 (CLOB SUBSTR)
+000000000 00:00:00.175341500 (DBMS_LOB.SUBSTR)

(Note that changes only affect the last two tests).

AND...same results, different day.

YMMV.

  • Thanks! I tried your last test (extending `l_text` to 50 characters, since 30 makes no sense any more), but the results were still comparable to my original test (after calling it several times): `0.006`, `0.679`, `1.064`. Which version of Oracle are you using? Can you reproduce your results when calling several times? – Peter Lang Apr 26 '12 at 11:41
  • Oracle 11.1. And yes, the tests results are repeatable. I changed the loops to iterate 1 million times each, then divided the resultant time interval by 10 to make them roughly comparable to the original tests. I re-ran the tests multiple times and each result was comparable. – Bob Jarvis - Слава Україні Apr 26 '12 at 11:47
  • That's weird. I tried my tests on three different instances (11.2 on Windows, 11.2 on Linux and 10.2 on Linux), and results are more or less the same as in my post. Would be great to have some more results from other people... – Peter Lang Apr 26 '12 at 11:50
  • What processor architecture? How much L1 and L2 cache? How many victims^H^H^H^H^H^H^H offerings sacrificed to the Elder Gods? Little things mean a lot... :-> – Bob Jarvis - Слава Україні Apr 26 '12 at 12:05
  • 4
    "Itanium... YMMV" - your money may vanish? – Jon Heller Apr 27 '12 at 04:46
  • I doubt that this is only hardware-related. As I said in my first comment, my first two tests are faster than yours, but the third one is significantly slower... – Peter Lang Apr 30 '12 at 06:19
  • @PeterLang - added tests for large (3 meg) CLOB. Same results. – Bob Jarvis - Слава Україні Apr 30 '12 at 11:53
  • Thanks again. Your `dbms_lob.substr` seems to work different than mine. I'd love to know why... – Peter Lang Apr 30 '12 at 12:07
  • @PeterLang - I did note that in the OCI docs there are two versions of OCILobCopy, those being OCILobCopy and OCILobCopy2, with the rule being to use the "2" version of any OCI function if such a function exists because the "2" versions are better/faster/improved/more capable. **Perhaps** "commodity OS" (e.g. Windows, Linux, etc) Oracle uses one version while the "commercial OS" versions (e.g. those for z/OS, HP-UX, etc) use a different version - or one uses C code while the other uses hand-optimized assembler. ??? YMMV – Bob Jarvis - Слава Україні Apr 30 '12 at 13:44
  • @BobJarvis: I would be surprised if this would make such a big difference, but who knows... Do you have a reference for these findings? I only see that `OCILobCopy` is deprecated, and that `OCILobCopy2` needs to be used for LOBs of size greater than 4 GB. – Peter Lang May 09 '12 at 06:35
  • My comments on possible differences between OCILobCopy and OCILobCopy2 were just suppositions on my part, not facts. I see that @tp9 posted some results that are roughly the same as what I'd found. He also used 11.1. I wish I had an 11.2 DB to test on. – Bob Jarvis - Слава Україні May 09 '12 at 11:24
  • And I wish I had an 11.1 to test :) Well, we still don't know why this happens, but your effort definitely deserves the bounty. – Peter Lang May 11 '12 at 08:44
2

Ran your script three times on the following system:

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

Here are the results:

+000000000 00:00:00.007787000 (VARCHAR2)
+000000000 00:00:03.093258000 (CLOB SUBSTR)
+000000000 00:00:00.340017000 (DBMS_LOB.SUBSTR)

+000000000 00:00:00.019460000 (VARCHAR2)
+000000000 00:00:03.302425000 (CLOB SUBSTR)
+000000000 00:00:00.336915000 (DBMS_LOB.SUBSTR)

+000000000 00:00:00.007773000 (VARCHAR2)
+000000000 00:00:03.210619000 (CLOB SUBSTR)
+000000000 00:00:00.336689000 (DBMS_LOB.SUBSTR)
tp9
  • 341
  • 3
  • 14
2

I see that on 11gR1 tests ran smooth for DBMS_LOB.substr, but for 11gR2 the function is slow.

Below my test on Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production on AIX6.

+000000000 00:00:00.009440000 (VARCHAR2)
+000000000 00:00:00.749113000 (CLOB SUBSTR)
+000000000 00:00:01.177685000 (DBMS_LOB.SUBSTR)
Florin Ghita
  • 17,525
  • 6
  • 57
  • 76
2

I know this is very old, but may still be relevant to people on older systems. This looks like a data type conversion problem. Based on something I noticed looking at the effect @bernhard.weingartner saw, the data type of the offset and amount arguments seems to make a huge difference.

This is run on 11.2.0.3 on Linux (OEL 5.6), and increased to a million iterations just to make the differences even more obvious:

DECLARE
  l_text   VARCHAR2(30) := 'This is a test record';
  l_clob   CLOB := l_text;
  l_substr VARCHAR2(30);
  t TIMESTAMP;
BEGIN
  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := SUBSTR(l_text,1,14);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (VARCHAR2)');

  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := SUBSTR(l_clob,1,14);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (CLOB SUBSTR)');

  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := DBMS_LOB.SUBSTR(l_clob,14,1);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR with 14,1)');

  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := DBMS_LOB.SUBSTR(l_clob,14.0,1.0);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR with 14.0,1.0)');

  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := DBMS_LOB.SUBSTR(l_clob,cast(14 as number), cast(1 as number));
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR with casts)');
END;
/
+000000000 00:00:00.043019000 (VARCHAR2)
+000000000 00:00:03.671621000 (CLOB SUBSTR)
+000000000 00:00:05.704337000 (DBMS_LOB.SUBSTR with 14,1)
+000000000 00:00:00.040097000 (DBMS_LOB.SUBSTR with 14.0,1.0)
+000000000 00:00:00.040907000 (DBMS_LOB.SUBSTR with casts)

The 11gR2 docs show the formal parameters as type INTEGER, but actually passing an integer (or pls_integer, or binary_double) is slow, while explicitly passing a number is quick.

From your original question and Bob's results this looks like something that changed between 11.1 and 11.2. I don't have a 12c instance to test on so don't know if it's changed again. Whether it's due to a change in dbms_lob or a wider change to how PL/SQL handles numeric values by default isn't clear. I haven't found anything on MOS that looks relevant.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318