5

I want to know what does Oracle's CLOB has to offer over BLOB data type. Both have data storage limits of (4 GB - 1) * DB_BLOCK_SIZE.

A text string which is longer than 4000 bytes can not fit in VARCHAR2 column. Now, I can use CLOB and BLOB as well to store this string.

Everyone says, CLOB is good and meant for character data and BLOB is for binary data such as images, unstructured documents.

But I see I can store character data inside a BLOB as well.

What I want to know:

So, question is on the basics, why CLOB and why not BLOB always? Is there anything to do with encoding?

May be the question title should be, How CLOB handles the character data differently than a BLOB?

Shailesh Pratapwar
  • 4,054
  • 3
  • 35
  • 46
  • 3
    Run a `select * from ...` on a table with a blob column, then tell me quickly what the text inside the BLOB is. And if you can't read it, then tell me which encoding was used to store the bytes which is something you have to know if you want to convert the raw bytes back to a readable string. –  Apr 15 '19 at 08:19
  • 2
    Why would you **not** use a CLOB to store character data? It's like storing dates in DATE columns and numeric values in NUMBER columns: use the most appropriate datatype for the data you want to store. – APC Apr 15 '19 at 08:26
  • @APC Yeah. I agree to use CLOB for character data. But I want to know what exactly a CLOB offers over BLOB ? – Shailesh Pratapwar Apr 15 '19 at 08:28
  • What exactly would the of a BLOB be? https://i.imgur.com/u1aCQ97.png or https://i.imgur.com/HuWDg1x.png - which text does the column `blob_data` contain? –  Apr 15 '19 at 08:31
  • Where did you read the limit is 4GB? The limit is `(4GB - 1) * DB_BLOCK_SIZE`, i.e. 8TB to 128TB depending on DB_BLOCK_SIZE. – Wernfried Domscheit Apr 15 '19 at 08:47
  • @WernfriedDomscheit Thanks for pointing. Rectified the question. – Shailesh Pratapwar Apr 15 '19 at 08:50
  • 1
    All string related functions like `REGEXP_xxx`, `SUBSTR`, `xPAD`, `xTRIM`, etc. are available only for `CLOB` but not for `BLOB`. Why on earth do you consider to store character data in `BLOB`? – Wernfried Domscheit Apr 15 '19 at 08:50
  • 1
    I am not. But I want to know how BLOB treats the character type data. – Shailesh Pratapwar Apr 15 '19 at 08:53
  • @a_horse_with_no_name Ok. I get your point that select * will give numbers in output for a blob column. But as per https://stackoverflow.com/a/27618946/1352127 , I can use substr function to get textual contents. Now, how is encoding handled here ? A clob and a blob eventually gets stored in computer in a binary form. Does clob stores the conding information along with it and uses it while retrieving the data ? – Shailesh Pratapwar Apr 15 '19 at 08:56
  • 3
    @ShaileshPratapwar - no, that answer (which is incomplete anyway, not sure how it's got so many upvotes) will at best give you a selection of bytes from the BLOB, not characters. That selection *might* represent characters, if you know what character set they represent so you can convert them; but if they were from a multibyte character set you might split a single character in half and not be able to convert it anyway. (Or convert it wrongly, and not notice.) – Alex Poole Apr 15 '19 at 09:13
  • 2
    FWIW, Oracle recommends storing JSON (which is text) in BLOB datatype: [search for "Considerations When Using LOB Storage for JSON Data"](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/adjsn/overview-of-storage-and-management-of-JSON-data.html#GUID-26AB85D2-3277-451B-BFAA-9DD45355FCC7) and lists some reasons; mostly around character set conversion and size. – Boneist Apr 15 '19 at 09:45

1 Answers1

2

I want to know how BLOB treats the character type data.

It doesn't treat it as character type data, it only see it as a stream of bytes - it doesn't know or care what it represents.

From the documentation:

The BLOB data type stores unstructured binary large objects. BLOB objects can be thought of as bitstreams with no character set semantics.


Does clob stores the conding information along with it and uses it while retrieving the data ?

Not explicitly, but the data is stored in the database character set, as with VARCHAR2 data. From the documentation again:

The CLOB data type stores single-byte and multibyte character data. Both fixed-width and variable-width character sets are supported, and both use the database character set.

You might also have noticed that the dbms_lob package has procedures to convert between CLOB and BLOB data types. For both of those you have to specify the character set to use. So if you choose to store character data as a BLOB you have to know the character set when converting it to a BLOB, but perhaps more crucially you have to know the character set to be able convert it back. You can do it, but it doesn't mean you should. You have no way to validate the BLOB data until you come to try to convert it to a string.

As @APC alluded to, this is similar to storing a date as a string - you lose the advantages and type-safety using the correct data type would give you, and instead add extra pain, uncertainty and overhead for no benefit.

The question isn't really what advantages CLOBs have over BLOBs for storing character data; the question is really the reverse: what advantages do BLOBs have over CLOBs for storing character data? And the answer is usually that there are none.

@Boneist mentions the recommendation to store JSON as BLOBs, and there is more about that here.

(The only other reasons I can think of off-hand are that you have to store data from multiple source character sets and want to preserve them exactly as you received them. But then either you are only storing them and will never examine or manipulate the data from within the database itself, and will only return them to some external application untouched; in which case you don't care about the character set - so you're handling purely binary data and shouldn't be thinking of it as character data at all, any more than you'd care that an image you're storing is PNG vs. JPG or whatever. Or you will need to work with the data and so will have to record which character set each BLOB object represents, so you can convert as needed.)

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Thanks. This helps to some extent and points to the role of database character sets in CLOB/varchar vs BLOB/binary storage. This also highlights some special cases when to have BLOB instead of CLOB. – Shailesh Pratapwar Apr 15 '19 at 10:16
  • 1
    In [Overview of Storing and Managing JSON Data](https://docs.oracle.com/en/database/oracle/oracle-database/18/adjsn/overview-of-storage-and-management-of-JSON-data.html#GUID-26AB85D2-3277-451B-BFAA-9DD45355FCC7) it says: "*In AL32UTF8 databases `CLOB` instances are stored using the UCS2 character set*" - is that really true? UCS-2 is able to represent only the [BMP](https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane) characters, thus not "Unicode-capable". UCS-2 is [obsolete](https://unicode.org/faq/utf_bom.html#utf16-11), I wound that Oracle still refers to it. – Wernfried Domscheit Apr 15 '19 at 10:57
  • Yeah,not sure; [this](https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/supporting-multilingual-databases-with-unicode.html#GUID-2A1CA4B9-AE97-4610-B454-AFAF33277BB1) doesn't help much either. I decided not to dig into the JSON stuff any further, but just wanted to mention it as Boneist had pointed it out - I was tempted to stop at 'usually there are none', and I wasn't sure about 'usually', but figured someone would point out legitimate if obscure uses. (Happy to have more added if/when people think of them!) – Alex Poole Apr 15 '19 at 11:14