oracle function call is being cached during bulk insert

Question

when inserting bulk rows and using a function call as one of the column values i'm getting exactly the same values for every 10-11 rows from the function. the function is actually generating UUID values and returns unique results. If i replace the function call in the insert statement with the actual code of the function it never repeats.

So what i conclude from this is that oracle actually caches the result of the function and calls it only once for every 10-11 rows it's inserting. how can i change this behavior?

the function i'm calling i've taken from http://www.oracle-base.com/articles/9i/UUID9i.php :

create or replace
FUNCTION        new_uuid RETURN VARCHAR2 AS
  l_seed        BINARY_INTEGER;
  l_random_num  NUMBER(5);
  l_date        VARCHAR2(25);
  l_random      VARCHAR2(4);
  l_ip_address  VARCHAR2(12);
BEGIN
  l_seed := TO_NUMBER(TO_CHAR(SYSDATE,'YYYYDDMMSS'));
  DBMS_RANDOM.initialize (val => l_seed);
  l_random_num := TRUNC(DBMS_RANDOM.value(low => 1, high => 65535));
  DBMS_RANDOM.terminate;

  l_date       := conversion_api.to_hex(TO_NUMBER(TO_CHAR(SYSTIMESTAMP,'FFSSMIHH24DDMMYYYY')));
  l_random     := RPAD(conversion_api.to_hex(l_random_num), 4, '0');
  l_ip_address := conversion_api.to_hex(TO_NUMBER(REPLACE(NVL(SYS_CONTEXT('USERENV','IP_ADDRESS'), '123.123.123.123'), '.', '')));

  RETURN SUBSTR(l_date, 1, 8)                     || '-' ||
         SUBSTR(l_date, 9, 4)                     || '-' ||
         SUBSTR(l_date, 13, 4)                    || '-' ||
         RPAD(SUBSTR(l_date, 17), 4, '0')         || '-' ||
         RPAD(L_RANDOM || L_IP_ADDRESS, 12, '0');
END;

and here's the insert statement i'm using:

INSERT INTO My_TABLE(ID, NAME,)
SELECT NEW_UUID(), NAME
FROM MY_TABLE2;
COMMIT;

the select inside this statement produces lots of repeating UUIDs. while this statement produces unique ones:

SELECT RPAD(RPAD(my_schema.conversion_api.to_hex(TRUNC(DBMS_RANDOM.VALUE( 1, 65535))), 4, '0') || my_schema.conversion_api.to_hex(TO_NUMBER(REPLACE(NVL(SYS_CONTEXT('USERENV','IP_ADDRESS'), '123.123.123.123'), '.', ''))), 12, '0') sss
FROM my_schema.MY_TABLE

Can you post the code for the insert statement you are ruuing including the procedure call? — Stephen ODonnell, Nov 02 '11 at 09:39
Which version of the database are you using? The code from Tim Hall's site is intended for Oracle 9i. — APC, Nov 02 '11 at 17:06

Benoit · Answer 1 · 2011-11-02T14:03:14.900

6

APC's diagnostic is correct. You need to have entropy in your random generator seed.

Though, Oracle already has a unique id generator which is SYS_GUID().

SELECT sys_guid(), name FROM my_table2;

You can try this which produces 9 GUIDs:

SELECT sys_guid() from dual connect by level < 10;

Tora screenshot

Don't try to reinvent the wheel when it already exists.

edited Nov 02 '11 at 14:03

answered Nov 02 '11 at 13:23

Benoit

76,634
23
210
236

did u try that code? select sys_guid() from dual connect by ... will return the same value for all rows. This doesn't solve the problem. – tbone Nov 02 '11 at 13:32
@tbone: I tried it and it returned 9 different rows. See screenshot. Using Oracle 10.2 – Benoit Nov 02 '11 at 14:02
Well, I saw what I thought was the same value, but was changing slightly! (1 number increment of 32 in the hex representation) +1 added back – tbone Nov 02 '11 at 14:12
@tbone: Checked the twelveth character on the four guids on your [link](http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/functions153.htm)? Oh, those guids are different. Human eyes are not good at spotting those small differences in the middle of long text strings. – Benoit Nov 02 '11 at 14:14
i don't want to use guids cause i'm inserting in the same table from several places and have id generation not only in the database but also in java using hibernate. i searched a damned lot and even asked a question here about possible collisions between the 2 algorithms. http://stackoverflow.com/questions/7918233/generating-ids-in-oracle-with-hibernate-backed-orm java and hibernate are using RFC 4122 to generate their uuids, sys_guid on the other hand isn't as well documented as java's uuid generator. i have no idea how it works. in the end we decided to use RFC 4122 generation in oracle. – Yervand Aghababyan Nov 02 '11 at 14:17
@tbone the result of sys_guid function is not being cached the way our custom function is. We were using sys_guid at first and had no problems with it. I don't know what's the reason of it not being cached or why i'm unable to make my function work the way sys_guid is working. – Yervand Aghababyan Nov 02 '11 at 14:30
@Yervand Aghababyan: See [this question](http://stackoverflow.com/questions/246930/is-there-any-difference-between-a-guid-and-a-uuid). It is really improbable that you have any collision. – Benoit Nov 02 '11 at 14:34
I'm developing an application with really with strict requirements. Even if there's an improbable chance of those 2 algorithms producing a collision i can't use them. the bugs here may result in really bad stuff ) also in the answers to the question you gave a link to nowhere it is written that oracle's guid and MS's guid are actually the same thing. I wouldn't be surprised at all if those 2 algorithms were different. – Yervand Aghababyan Nov 02 '11 at 14:42
1

@Yervand Aghababyan: Even using a single algorithm there's still a chance of a collision. It's simply not possible to eliminate the possibility while using randomly generated values. It may be a better use of time to develop a strategy to mitigate the effects of a collision (i.e. detect the collision and generate a new value), rather than to spend a lot of time reducing the odds from extremely unlikely to incredibly unlikely. – Allan Nov 02 '11 at 15:37
+1 I agree with Benoit that the best approach is to use Oracle's built-in functionality. Tim wrote that function for Oracle 9i which (I think) didn't support SYS_GUID. – APC Nov 02 '11 at 16:58

APC · Accepted Answer · 2011-11-02T13:46:36.227

The thing is, "random" isn't actually random. Given the same seed for DBMS_RANDOM.INITIALISE(), subsequent calls to DBMS_RANDOM.VALUE() will return the same result. Check it out:

SQL> exec DBMS_RANDOM.initialize (val => 1)

PL/SQL procedure successfully completed.

SQL> select TRUNC(DBMS_RANDOM.value(low => 1, high => 65535)) from dual
  2  /

TRUNC(DBMS_RANDOM.VALUE(LOW=>1,HIGH=>65535))
--------------------------------------------
                                       49214

SQL> r
  1* select TRUNC(DBMS_RANDOM.value(low => 1, high => 65535)) from dual

TRUNC(DBMS_RANDOM.VALUE(LOW=>1,HIGH=>65535))
--------------------------------------------
                                       56385

SQL> r
  1* select TRUNC(DBMS_RANDOM.value(low => 1, high => 65535)) from dual

TRUNC(DBMS_RANDOM.VALUE(LOW=>1,HIGH=>65535))
--------------------------------------------
                                       23941

SQL> exec DBMS_RANDOM.initialize (val => 1)

PL/SQL procedure successfully completed.

SQL> select TRUNC(DBMS_RANDOM.value(low => 1, high => 65535)) from dual;

TRUNC(DBMS_RANDOM.VALUE(LOW=>1,HIGH=>65535))
--------------------------------------------
                                       49214

SQL> r
  1* select TRUNC(DBMS_RANDOM.value(low => 1, high => 65535)) from dual

TRUNC(DBMS_RANDOM.VALUE(LOW=>1,HIGH=>65535))
--------------------------------------------
                                       56385

SQL> r
  1* select TRUNC(DBMS_RANDOM.value(low => 1, high => 65535)) from dual

TRUNC(DBMS_RANDOM.VALUE(LOW=>1,HIGH=>65535))
--------------------------------------------
                                       23941

SQL>

If we look at the code you got from Tim's site we see this line:

l_seed := TO_NUMBER(TO_CHAR(SYSDATE,'YYYYDDMMSS'));

From which we can surmise that your process is inserting 10-11 rows per second :)

If you replace SYSDATE with SYSTIMESTAMP and change the mask to go to millisecs (or smaller) then you should get a different seed each time, and hence a different value each time. Note that you still need to force the re-evaluation of the function to guarantee getting a different result for each row (see the demo below).

Um, did I say "guarantee". Uh-oh. It is in the nature of the random that it can produce the same result two goes running. So perhaps that should be "to minimize teh chances of getting the same result for each row".

Alternatively, remove the initialisation from the function and call it before you start your bulk inserts. Whether this is feasible depends entirely on your business logic.

Demonstration

Here is a function which generates a "random" number:

create or replace function get_random_number 
    (p_seed in number := 0)
    return pls_integer
is
begin
    if p_seed = 0
    then
        DBMS_RANDOM.initialize (val => TO_NUMBER(TO_CHAR(SYSDATE,'YYYYDDMMSS')));
    else    
        DBMS_RANDOM.initialize (val => p_seed);
    end if;
    return TRUNC(DBMS_RANDOM.value(low => 1, high => 65535));
end;
/

If we call it twenty times with the default parameter it returns the same number each time:

SQL> select rownum
       , get_random_number
from   dual
connect by level <= 20
/
  2    3    4    5  
    ROWNUM GET_RANDOM_NUMBER
---------- -----------------
         1             10239
         2             10239
         3             10239
         4             10239
         5             10239
         6             10239
         7             10239
         8             10239
         9             10239
        10             10239
        11             10239
        12             10239
        13             10239
        14             10239
        15             10239
        16             10239
        17             10239
        18             10239
        19             10239
        20             10239

20 rows selected.

SQL>

Whereas if we pass a value it uses a different seed each time and lo! we get a different result:

SQL> select rownum
       , get_random_number(rownum)
from   dual
connect by level <= 20
/
  2    3    4    5  
    ROWNUM GET_RANDOM_NUMBER(ROWNUM)
---------- -------------------------
         1                     49214
         2                      6476
         3                     42426
         4                      2370
         5                     48546
         6                     52483
         7                      6964
         8                     46764
         9                     27569
        10                      7673
        11                     52446
        12                     50229
        13                     27861
        14                     31413
        15                     11518
        16                     13471
        17                     38766
        18                      9949
        19                     61656
        20                     25797

20 rows selected.

SQL>

This works because passing in ROWNUM forces the evaluation of the function for each row. You should not use ROWNUM as the seed in a production system: timestamps are better. Or concatenate the datetime with the rownum to provide a unique seed for each row.

I didn't notice the fact that the random generator was initialized with a bad seed but your estimation of us inserting 10-11 rows a second doesn't sound to me as a correct one. we were inserting some 200 rows which was taking a second at most. if it was taking 20 seconds as you're saying i would have noticed that for sure :) and yes, we concluded to the same results as you, that we would have to use a seed parameter for our ID generation function to be re-evaluated for each row. which is a damned ugly solution IMO. that's why we decided to stick with implementing the generator in java. — Yervand Aghababyan, Nov 02 '11 at 14:23
according to our tests the seed generation function was called about 50 times for 190 rows. i don't remember the overall time the insert took but i think it should have been several milliseconds. — Yervand Aghababyan, Nov 02 '11 at 14:25
@On reflection I think the problem is definitely caused by funbction caching. So to make it work you need to force the re-evaluation of the function as well as addressing the bad seed. — APC, Nov 02 '11 at 17:04

score 1 · Answer 3 · answered Nov 02 '11 at 11:09

1

Haven't tried it, but I believe Oracle is calculating the value for your new_uuid() function once, and outputting for each returned row (same as if you did select systimestamp, whatever from whatever... it would output the same timestamp for all rows.

So, you can modify your function to take some input from each row (for the seed perhaps?), or just use sequences.

answered Nov 02 '11 at 11:09

tbone

15,107
3
33
40

yeah. I also think it's doing that, but can't I force it to reevaluate the function for each row? – Yervand Aghababyan Nov 02 '11 at 11:41
No, but try passing in the nextval of a sequence as the seed value (select new_uuid(my_seq.nextval), whatever from whatever). The sequence should increment for every row, thus changing the seed and the uuid value. Haven't tried it, but that should work. – tbone Nov 02 '11 at 11:46
Check out this conversation on AskTom: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3181424400346795479 : since the function is non deterministic " you cannot rely on how many times or when or if SQL will call your function" – Harrison Nov 02 '11 at 12:53
ok, but you might want to make that java function callable from the db so that non-java apps can use it (unless you can guarantee that your java app(s) will always be the only ones to insert). – tbone Nov 02 '11 at 12:56
-1. No, Oracle calucates it once for `DETERMINISTIC` functions. – Benoit Nov 02 '11 at 13:23

oracle function call is being cached during bulk insert

3 Answers3