0
create or replace procedure prcdr_Clustering is
 v_sampleCount  number;
 v_sampleFlag   number;
 v_matchPercent number;
 v_SpendAmount  Number(18, 2);
 cursor cur_PDCSample is
 SELECT *
  FROM TBL_BIL
 WHERE UDF_CHK = 'N';       
rec_Pdcsample TBL_BIL%rowtype;
BEGIN
OPEN cur_PDCSample;
LOOP
FETCH cur_PDCSample
  into rec_Pdcsample;
EXIT WHEN cur_PDCSample%NOTFOUND;
SELECT COUNT(*)
  INTO v_sampleCount
  FROM TBL_BIL
 WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount <> 0 THEN
  UPDATE TBL_BIL
     SET UDF_CHK = 'Y'
   WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
  IF v_sampleCount > 1 THEN
    v_sampleFlag := 1;
  ELSE
    IF v_sampleCount = 1 THEN
      v_sampleFlag := 2;
    ELSE
      v_sampleFlag := 0;
    END IF;
  END IF;
  UPDATE TBL_BIL
     SET UDF_SAMPLECOUNT = v_sampleCount, UDF_SAMPLEFLAG = v_sampleFlag
   WHERE uniqueid = rec_Pdcsample.uniqueid;
  UPDATE TBL_BIL
     SET UDF_PID = rec_Pdcsample.uniqueid
   WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
  UPDATE TBL_BIL
     SET UDF_PIDSPEND = v_SpendAmount
   WHERE uniqueid = rec_Pdcsample.uniqueid;
  UPDATE TBL_BIL
     SET UDF_MATCHPERCENT = 1
   WHERE uniqueid <> rec_Pdcsample.uniqueid
     AND UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
 END IF;
IF cur_PDCSample%ISOPEN THEN
   CLOSE cur_PDCSample;
 END IF;
OPEN cur_PDCSample;
END LOOP;
IF cur_PDCSample%ISOPEN THEN
 CLOSE cur_PDCSample;
END IF;
 end PrcdrClustering;

It takes me days to execute, my table has 225,846 rows of data.

The structure of my table is :-

UNIQUEID    NUMBER  Notnull primary key
VENDORNAME  VARCHAR2(200)               
SHORTTEXT   VARCHAR2(500)               
SPENDAMT    NUMBER(18,2)                
UDF_TOKENIZED   VARCHAR2(999)               
UDF_PID NUMBER(10)              
UDF_SAMPLEFLAG  NUMBER(4)               
UDF_SAMPLECOUNT NUMBER(4)               
UDF_MATCHPERCENT    NUMBER(4)               
UDF_TOKENCNT    NUMBER(4)               
UDF_PIDSPEND    NUMBER(18,2)                
UDF_CHK VARCHAR2(1)
Sathyajith Bhat
  • 21,321
  • 22
  • 95
  • 134
Ali C
  • 1
  • 1
    your code has two tables. what is the structure of the other table? Also what kind of indexes do you have? Did you check them already? Furthermore you can use oracles `explain plan` to check the execution plan for your queries, maybe even your whole procedure (I don't know) ... – ZeissS Feb 18 '12 at 10:11

3 Answers3

4

Where to start? I've a number points to make.

  1. You're doing bulk updates; this implies that bulk collect ... forall would be far more efficient.
  2. You're doing multiple updates of the same table, which doubles the amount of DML.
  3. As you've already selected from the table, re-entering it to do another count is pretty pointless, use an analytic function to get the result you need.
  4. Indentation, indentation, indentation. Makes your code much easier to read.
  5. You can use elsif to reduce the amount of statements to be evaluated ( very, very minor win )
  6. If the uniqueid is unique you can use rowid to update the table.
  7. You're updating udf_pidspend to null, whether this is intentional or not there's no need to do a separate update for it.
  8. You can do a lot more in the cursor, but there's obviously no need to select everything, which'll decrease the amount of data you need to read from the disks.
  9. You may need a couple of commits in there; though this means you can't rollback if it fails midway.
  10. I hope tbl_bil is indexed on uniqueid
  11. As GolzeTrol noted you're opening the cursor multiple times. There's no need for this.

As general rules:

  • If you're going to select / update or delete from a table do it once if possible and as few times as possible if not.
  • If you're doing bulk operations use bulk collect.
  • Never write select *
  • Use rowid where possible it avoids all index problems.

This will only work in 11G, I answered this question recently where I provided my own way of dealing with this implementation restriction in versions prior to 11G and linked to Ollie's, Tom Kyte's and Sathya's

I'm not entirely certain what you're trying to do here so please forgive me if the logic is a little off.

create or replace procedure prcdr_Clustering is

   cursor c_pdcsample is
    select rowid as rid
         , count(*) over ( partition by udf_tokenized ) as samplecount
         , udf_chk
         , max(uniqueid) over ( partition by udf_tokenized ) as udf_pid
      from tbl_bil
     where udf_chk = 'N';       

   type t__pdcsample is table of c_pdcsample%rowtype index by binary_integer;
   t_pdcsample t__pdcsample;

begin

   open c_pdcsample;
   loop

      fetch c_pdcsample bulk collect into t_pdcsample limit 1000;

      exit when t_pdcsample.count = 0;

      if t_pdcsample.samplecount <> 0 then
         t_pdcsample.udf_chk := 'y';

         if t_pdcsample.samplecount > 1 then
            t_pdcsample.samplecount := 1;
         elsif t_pdcsample.samplecount = 1 then
            t_pdcsample.samplecount := 2;
         else
            t_pdcsample.samplecount := 0;
         end if;

      end if;

      forall i in t_pdcsample.first .. t_pdcsample.last
         update tbl_bil
            set udfsamplecount = t_pdcsample.samplecount
              , udf_sampleflag = t_pdcsample.sampleflag
              , udf_pidspend = null
              , udf_pid = t_pdcsample.udf_pid
          where rowid = t_pdcsample(i).rowid
                ;

      for i in t_pdcsample.first .. t_pdcsample.last loop
         update tbl_bil TBL_BIL
            set udfmatchpercent = 1
          where uniqueid <> t_pdcsample.uniqueid
            and udf_tokenized = t_pdcsample.udf_tokenized;
      end loop;

      commit ;

   end loop;
   close c_pdcsample;

end PrcdrClustering;
/

Lastly calling all tables tbl_... is a little bit unnecessary.

Community
  • 1
  • 1
Ben
  • 51,770
  • 36
  • 127
  • 149
  • You raise many good points. But especially point 9 and also not advising to use a single SQL statement prevents me from doing a +1. – Rob van Wijk Feb 19 '12 at 07:41
  • @RobvanWijk, if you look at the logic I think a single SQL statement is impossible. I tried to get it down that far but the last update stops me :-). – Ben Feb 19 '12 at 10:37
  • Please take a look at my answer to see this could be done in a SQL statement. – Rob van Wijk Feb 19 '12 at 14:10
4

Here is a variant using a single SQL statement. I'm not 100% certain that the logic is exactly the same, but for my test set, it is. Also the current procedure is non deterministic when you have more than one record with udf_chk = 'N' and the same udf_tokenized ...

This is the refactored procedure

SQL> create procedure prcdr_clustering_refactored
  2  is
  3  begin
  4    merge into tbl_bil t
  5    using ( select tb1.uniqueid
  6                 , count(*) over (partition by tb1.udf_tokenized) cnt
  7                 , max(decode(udf_chk,'N',uniqueid)) over (partition by tb1.udf_tokenized order by tb1.udf_chk) pid
  8              from tbl_bil tb1
  9             where udf_chk = 'N'
 10                or exists
 11                   ( select 'dummy'
 12                       from tbl_bil tb2
 13                      where tb2.udf_tokenized = tb1.udf_tokenized
 14                   )
 15          ) q
 16       on ( t.uniqueid = q.uniqueid )
 17     when matched then
 18          update
 19             set t.udf_samplecount = decode(t.udf_chk,'N',q.cnt,t.udf_samplecount)
 20               , t.udf_sampleflag = decode(t.udf_chk,'N',decode(q.cnt,1,2,1),t.udf_sampleflag)
 21               , t.udf_pid = q.pid
 22               , t.udf_pidspend = decode(t.udf_chk,'N',null,t.udf_pidspend)
 23               , t.udf_matchpercent = decode(t.udf_chk,'N',t.udf_matchpercent,1)
 24               , t.udf_chk = 'Y'
 25    ;
 26  end;
 27  /

Procedure created.

And here is a test:

SQL> select *
  2    from tbl_bil
  3   order by uniqueid
  4  /

UNIQUEID VENDORNAME SHORTTEXT  SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
       1 a          a                 1 bl               0              0               0                0            0            0 N
       2 a          a                 1 bla              0              0               0                0            0            0 N
       3 a          a                 1 bla              0              0               0                0            0            0 Y
       4 a          a                 1 bla              0              0               0                0            0            0 Y
       5 a          a                 1 bla              0              0               0                0            0            0 Y
       6 a          a                 1 blah             0              0               0                0            0            0 N
       7 a          a                 1 blah             0              0               0                0            0            0 Y
       8 a          a                 1 blah             0              0               0                0            0            0 Y
       9 a          a                 1 blah             0              0               0                0            0            0 Y
      10 a          a                 1 blah             0              0               0                0            0            0 Y
      11 a          a                 1 blah             0              0               0                0            0            0 Y

11 rows selected.

SQL> exec prcdr_clustering

PL/SQL procedure successfully completed.

SQL> select *
  2    from tbl_bil
  3   order by uniqueid
  4  /

UNIQUEID VENDORNAME SHORTTEXT  SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
       1 a          a                 1 bl               1              2               1                0            0              Y
       2 a          a                 1 bla              2              1               4                0            0              Y
       3 a          a                 1 bla              2              0               0                1            0            0 Y
       4 a          a                 1 bla              2              0               0                1            0            0 Y
       5 a          a                 1 bla              2              0               0                1            0            0 Y
       6 a          a                 1 blah             6              1               6                0            0              Y
       7 a          a                 1 blah             6              0               0                1            0            0 Y
       8 a          a                 1 blah             6              0               0                1            0            0 Y
       9 a          a                 1 blah             6              0               0                1            0            0 Y
      10 a          a                 1 blah             6              0               0                1            0            0 Y
      11 a          a                 1 blah             6              0               0                1            0            0 Y

11 rows selected.

SQL> rollback
  2  /

Rollback complete.

SQL> exec prcdr_clustering_refactored

PL/SQL procedure successfully completed.

SQL> select *
  2    from tbl_bil
  3   order by uniqueid
  4  /

UNIQUEID VENDORNAME SHORTTEXT  SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
       1 a          a                 1 bl               1              2               1                0            0              Y
       2 a          a                 1 bla              2              1               4                0            0              Y
       3 a          a                 1 bla              2              0               0                1            0            0 Y
       4 a          a                 1 bla              2              0               0                1            0            0 Y
       5 a          a                 1 bla              2              0               0                1            0            0 Y
       6 a          a                 1 blah             6              1               6                0            0              Y
       7 a          a                 1 blah             6              0               0                1            0            0 Y
       8 a          a                 1 blah             6              0               0                1            0            0 Y
       9 a          a                 1 blah             6              0               0                1            0            0 Y
      10 a          a                 1 blah             6              0               0                1            0            0 Y
      11 a          a                 1 blah             6              0               0                1            0            0 Y

11 rows selected.

Regards,
Rob.

Rob van Wijk
  • 17,555
  • 5
  • 39
  • 55
  • You deserve +1 for even bothering, but I think you've got it with the `or exists ... dummy...` I hadn't considered doing that. – Ben Feb 19 '12 at 14:51
0

I don't know why, but you open the cur_PDCSample, which select (I suspect) thousands of records. And then, in a loop, you close the cursor and reopen it, each time processing only the first record that is returned.

If you open the cursor once, process each record and then close it, your procedure will probably go a lot faster.

Actually, since you do not always update TBL_BIL.UDF_CHK to 'Y', it seems to me that your current procedure may run infinitely.

GolezTrol
  • 114,394
  • 18
  • 182
  • 210