2

I am working on a project that needs migration of existing Oracle blobs into files. The environment to read from is a shared Oracle 10gR2 server. Currently I have a script using UTL_FILE. However, the process is pretty slow. It takes around 3 hours to extract 25 GB of sample data. The actual data to be moved is in the order of 1 TB. I need help/advice in tuning this significantly.

Here is my process:

  1. Open a cursor to fetch the list of blob ids and names
  2. Start a loop to go thru each blob
  3. Extract the blob using BLOB2FILE, a custom stored procedure (picked it up from a website and modified it slightly)

here is the code:

create or replace
PROCEDURE BLOB2File(
    lngBlobID IN NUMBER,
    sFileName IN VARCHAR2,
    sDir      IN VARCHAR2)
AS
  iFileLen INTEGER;
  iLineLen INTEGER := 32000; -- max line size for utl_file
  vStart   NUMBER  := 1;
  vBlob BLOB;
  l_output utl_file.file_type;
  my_vr RAW(32000);
  iTmp INTEGER;
BEGIN
  -- get blob details
  LOG_IT('Entered. Blob Id: ' || lngBlobID || ', File Name: ' || sFileName || ', Directory: ' || sDir);
  SELECT blobData,
    lengthb(blobData)
  INTO vBlob,
    iFileLen
  FROM blobTable
  WHERE id = lngBlobID;
  LOG_IT('Acquired the blob. Blob size: ' || TO_CHAR(iFileLen));
  l_output := utl_file.fopen(sDir, sFileName,'wb', iLineLen);
  vStart   := 1;
  iTmp     := iFileLen;
  -- if small enough for a single write
  IF iFileLen < iLineLen THEN
    utl_file.put_raw(l_output,vBlob);
    utl_file.fflush(l_output);
  ELSE -- write in pieces
    vStart      := 1;
    WHILE vStart < iFileLen AND iLineLen > 0
    LOOP
      dbms_lob.read(vBlob,iLineLen,vStart,my_vr);
      utl_file.put_raw(l_output,my_vr);
      utl_file.fflush(l_output);
      -- set the start position for the next cut
      vStart := vStart + iLineLen;
      -- set the end position if less than 32000 bytes
      iTmp       := iTmp - iLineLen;
      IF iTmp     < iLineLen THEN
        iLineLen := iTmp;
      END IF;
    END LOOP;
  END IF;
  utl_file.fclose(l_output);
  LOG_IT('Exited');

  EXCEPTION
  WHEN OTHERS THEN
  LOG_IT('**ERROR** ' || SQLERRM, SQLCODE, DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
END;

LOG_IT is a stored proc logging to a table. There should not be any significant hit there. I tried optimizing Step 1 by using BULK FETCH instead of a normal FETCH. However, it didn't yield any significant result.

Can anybody suggest any ideas for improvement or, even better, a more performant way of approaching this?

APC
  • 144,005
  • 19
  • 170
  • 281
sammy
  • 524
  • 4
  • 11

1 Answers1

2

Assuming that your hardware is sufficient to handle far more than 8 GB/hour of sustained writes to sDir (and to handle reading a similar amount from blobTable and to handle whatever other I/O your system needs), the simplest option would likely to be to spawn a few parallel sessions each of which is calling this procedure. For example, if you wanted to run three jobs in parallel each of which was extracting one LOB, you could do something like this.

DECLARE
  l_jobno INTEGER;
BEGIN
  dbms_job.submit( l_jobno, 'begin BLOB2File( 1, ''1.lob'', ''DIRECTORY'' ); end;', sysdate + interval '5' second );
  dbms_job.submit( l_jobno, 'begin BLOB2File( 2, ''2.lob'', ''DIRECTORY'' ); end;', sysdate + interval '5' second );
  dbms_job.submit( l_jobno, 'begin BLOB2File( 3, ''3.lob'', ''DIRECTORY'' ); end;', sysdate + interval '5' second );
  commit;
END;

You probably don't want to have a separate thread for every BLOB in reality-- you probably want to generate a smaller number of jobs and give them each a range of lngBlobID values to work on. The number of jobs Oracle will run at any one time is limited by the JOB_QUEUE_PROCESSES parameter so you could submit thousands of jobs and just let Oracle limit how many it will run simultaneously.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
  • Thank you for the response. It sounds possible and helpful. Will definitely try this. However since the db is in a shared environment, I hope submitting a lot of jobs doesnt pull down the server or slow down the work for other users. – sammy May 04 '12 at 21:39
  • @sammy - Well, of course, the more jobs you run simultaneously, the more I/O resources you'll consume and the fewer I/O resources will be available to others. That's why I started off by talking about how much I/O bandwidth your hardware had that was available for you to exploit after accounting for the needs of other users in the system. – Justin Cave May 04 '12 at 21:43
  • Is there a limit to the number of jobs that can be submitted? I have around 500,000 files to be pulled. – sammy May 04 '12 at 21:43
  • @sammy - I don't believe there is a limit on the number of jobs that can be submitted. The number that can be run simultaneously is controlled by `JOB_QUEUE_PROCESSES`. As I said, you probably don't want to submit 500,000 jobs-- you probably want to submit, say, 50 jobs each of which processes a set of 10,000 LOBs or 500 jobs each of which processes a set of 1,000 LOBs. – Justin Cave May 04 '12 at 21:46
  • Do you think trying to use an SSD in place of a regular disk is going to bring down the rate significantly? – sammy May 07 '12 at 14:02
  • @sammy - Bring down the rate of what? SSD should (all else being equal) mean faster writes so I would expect it to increase the rate at which you are extracting BLOBs if the alternative is that you're writing to a single physical hard drive. If you're writing to hundreds of hard drives on a SAN, however, you may well get more throughput than a single SSD. – Justin Cave May 07 '12 at 14:46
  • Thank you for the response Justin. I am writing to a single NAS drive. Anyway the question seems moot as there arent any 1 TB NAS SSDs available. On a side note, are there any session level parameters to turn on/off to extract more performance in this scenario? – sammy May 07 '12 at 16:39
  • @sammy - Session level, no. Playing with the system-level `job_queue_processes` may result in performance gains. But if you are writing to a single physical drive, that drive is going to be the bottleneck. – Justin Cave May 07 '12 at 16:43