8

I need to store a page of html data in a field called ProjectDescription in mySQL database using Spring & JPA 2.1 . I have read this question and all other questions with BLOB tag, but need some clarity on why the fields are saving in a certain manner in my db. I have created the fields in the manner below using both String and byte[] field types.

Method1: Saving data as TEXT (After encoding in Base64 format I save my html data as String using the below method)

@Basic(fetch = FetchType.LAZY) 
@Lob
private String projectDescription = "";

Method2: Saving data as binary using BLOB

@Basic(fetch = FetchType.LAZY)
@Lob
@Column(length=5000)
private byte[]  projectDescription1 =new byte[0];

My assumptions: I assume that since a page of html data is not very large, TEXT is ok as comapred to BLOB

I tested both and the fields are saved as below in mySQL database

In Method1:

  1. Type: TEXT
  2. DisplaySize is constantly 1431655765.

This size doesn't change irrespective of my @Column(length=5000) annotaton.

In Method2

  1. Type: BLOB
  2. DisplaySize: -1

Question1: What is the source of this DisplaySize ? This seems quite large in case of TEXT and very small(-1) in case of byte[] field type. Why doesn't @Column length seem to change change the DisplaySize.

Question2: Is it ok to store HTML data as String field type(eventually as TEXT) as opposed to byte[] (eventually as a blob)?

Note: I have read all the questions with BLOB tags and am clear that images/documents need to be saved as BLOB and text as as CLOB/TEXT. However, would like to confirm again for HTML data given how large DisplaySize is allocated in DB for TEXT.

Thanks.

Community
  • 1
  • 1
HopeKing
  • 3,317
  • 7
  • 39
  • 62
  • 1
    Where are you seeing these `DisplaySize` values? Whilst MySQL numeric types can have a "display width" [attribute](https://dev.mysql.com/doc/en/numeric-type-attributes.html), string types either have an explicit or implicit "maximum length"—but this would not normally be referred to as `DisplaySize`. Furthermore, the value of 1431655765, which is 0x55555555, is very strange indeed... it equals 2^32/3, and therefore represents the maximum number of 3-byte-characters that can be stored in a `LONGTEXT` column... not sure what use that is to anyone. – eggyal Jan 20 '18 at 13:49
  • I see DisplaySize in fieldTypes while using MySQL Workbench. Yes, it seems strange indeed. – HopeKing Jan 20 '18 at 13:59
  • MySql WorkBench has a lot of bugs, make your checks with another client like hidesql, sqlyog, phpmyadmin... Because -1 for blob looks like a overflow in workbench, and post the create table statements because they can tell a lot more. And about display size https://dba.stackexchange.com/a/370 . I still – ronchi82 Jan 27 '18 at 02:28
  • And unless you are compressing the data, save text as text to preserve the critical encoding information. – ronchi82 Jan 27 '18 at 02:34

1 Answers1

5

If it is a whole page, why go through the extra layer of fetching from a database table? If it is only part of a page, then I recommend TEXT CHARACTER SET utf8mb4. Any non-UTF-8 text in the page will cause trouble; may as well catch it early.

And the database industry is converging on using UTF-8 for all text.

Base64 is 8/6 times as big. And all it is doing is avoiding trouble with non-UTF-8 characters that should not be there. If anything, compress it in the client and store in a BLOB, thereby shrinking 3/1.

In MySQL, TEXT is 64K bytes. You might be better off with MEDIUMTEXT, which has a limit of 16MB. I say "bytes" because, for example, Chinese needs 3, sometimes 4, bytes per character, so only about 25K characters of Chinese text will fit in TEXT.

"DisplaySize is constantly 1431655765" -- What??? A gigabyte for a web page; no way! Even if that includes images (which it should not), that is quite unreasonable. Edit: eggyal's Comment on 2^32/3 probably explains this odd number.

In MySQL, SELECT length(my_text) ... will get the number of bytes in that column.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I now understand that UTF-8 doesn't accept european characters. So in that case Base64 encoding is the only way out isn't it. Also, why is Base64 encoding 6/8 times bigger? – HopeKing Mar 16 '18 at 08:17
  • @HopeKing - What? UTF-8 handles all European characters, and all characters of all established languages worldwide. Accented European letters will not fit in _Ascii_, which has only 7 bits. Base64 juggles _bits_ to take any 8-bit bytes and display them in ascii. But you can no longer read even plain English. – Rick James Mar 16 '18 at 10:39