1

I have a encountered very strange behavior of Oracle LOB. Situation: We have partitioned IOT that contains CLOB column. CLOB has separate LOB storage set up with LOGGING RETENTION and DISABLE IN ROW STORAGE options. CHUNK size is 8192bytes. PCTFREE is set default(null in dba_tables). Now, we need to create a test case with certain amount of CLOBs loaded. we have chosen 19.5KB CLOB. After loading this CLOB 40 million times(used for perf. testing, does not matter about content) - the size on file system and in dba_data_files is 1230GB.

Question:

We estimated size of 40mil. CLOBs with size 19.5KB to ~780GB. How did we get 450GB more? I would guess it has something to do with CHUNK size - 19.5KB would use 3 CHUNKs, thus being size 24KB, which is still only 960GB. LOB index is around 2GBs. Does anybody have an idea?(sorry for poor explanation)(P.S. running ORACLE 11g)

Thank you in advance!

Nick Krasnov
  • 26,886
  • 6
  • 61
  • 78
Mike Nagy
  • 137
  • 1
  • 11
  • Consider this extra 28% as taxes paid for the use of Oracle. :-) – Egor Skriptunoff Aug 02 '13 at 14:19
  • There may have been something else in the datafile at some point. What is the size according to DBA_SEGMENTS? – Jon Heller Aug 02 '13 at 15:00
  • What is your block size? If it's 16k then the numbers would look about right (two 16k blocks per row). – Alex Poole Aug 02 '13 at 16:19
  • Hi, DBA_SEGMENTS show exactly same size. Block size is 8KB – Mike Nagy Aug 03 '13 at 10:53
  • Another question arises - I looked into the length of the CLOB and it is 15760 characters. Database is in UTF8, therefore using 2bytes per single character. Thus can it be that, CLOB is just stored as extrapolation of VARCHAR/CHAR? Number would fit this way. I might be wrong here. – Mike Nagy Aug 05 '13 at 06:56

1 Answers1

1

Your comment is correct: "Data in CLOB columns is stored in a format that is compatible with UCS-2 when the database character set is multibyte, such as UTF8 or AL32UTF8". Although I would not say this is just an extrapolation of VARCHAR2. UTF8 is a varying width character set, and does not always require 2 bytes.

15760 characters is 31520 bytes, which can only fit in 4 blocks, 32768 bytes. 32768 * 40000000 / 1024 / 1024 / 1024 = 1220GB. Which doesn't perfectly match your result, but is very close. We'd need to see some more detailed numbers to look for a perfect match.

Jon Heller
  • 34,999
  • 6
  • 74
  • 132
  • Thank you for your comment! The 7GB difference can be related to PCTVERSION and LOB Index. Although there is something that bothers me - our DB is is AL32UTF8, but specification of this charset says that ASCII characters are stored as 1byte, any European chars are stored 2bytes,etc.([Character set encodings and storage size factors](http://stackoverflow.com/questions/14991238/character-set-encodings-and-storage-size-factors)). But I guess this would not be true for LOB columns. – Mike Nagy Aug 06 '13 at 07:18
  • Yeah, it's kind of strange. One big advantage of UTF8 is that is uses less space. But that advantage is lost if large text values are internally stored in a different way. If space is a huge issue you may want to consider using compression, or BFILEs. Although you may run into a [similar problem with BFILEs](http://stackoverflow.com/a/17764216/409172). – Jon Heller Aug 06 '13 at 17:16