0

I have a table with a VARCHAR(10240) column. I have a Java application which store a list of Strings to that table column as a comma separated value.

But now I want to compress that String and store it in the same column to comply with the storage requirements and constraints. Most of the SO questions and answers for similar questions suggest to change the table column type to BLOB or a binary safe type.

Result of the compression is most of the time a Binary data, so we might need a binary oriented type. But can we encode that binary data with Base64 or something and store it.Refer this also.

String -> compress -> Binary data (byte[]) -> encode Base64 -> store in VARCHAR

And can we have something like below for decompress that,

VARCHAR -> decode base64 -> Binary data (byte[]) -> decompress -> String

Trick is I can't change that column into BLOB or anything, I need to use the column as VARCHAR(10240) and store the compressed data there. Is there anyway to achieve this ?

prime
  • 14,464
  • 14
  • 99
  • 131
  • The result of encoding to Base64 is a `byte[]` and you can make a String from that - where is your problem? – laune Feb 17 '18 at 04:35
  • @laune will that compress the data ? Encoding does not guarantee to compress the data right ? I just want to know a good way to compress that String and store it in a varchar field – prime Feb 17 '18 at 04:49
  • Base64 encoding guarantees *increasing* the size by 33%. – chrylis -cautiouslyoptimistic- Feb 17 '18 at 05:07
  • @chrylis so do you have a suggestion on this question. Need to compress and store it in a varchar field – prime Feb 17 '18 at 05:14
  • Don't use a VARCHAR, use whatever MySQL uses for binary data... VARBINARY, BLOB, etc. – Jim Garrison Feb 17 '18 at 06:58
  • You have shown a sequence that compresses and encodes: so you have a byte[]. If compressing and encoding results in more bytes than the original string, omit compression and encoding. If you can store the original STring in a byte array containing some 8-bit-encoding (ISO-8859-1 or similar) you might be better off this way, without compression and encoding. – laune Feb 20 '18 at 12:52

1 Answers1

0

You can convert they String to bytes by calling the getBytes method. Then you can use java.util.Deflater as described in this question: A Java library to compress (e.g. LZW) a string . The Deflater will give you a compressed byte array which you should be able to set in the varchar column in mysql.

ilooner
  • 2,480
  • 15
  • 32
  • Can I just send the byte[] to the db insert query ? – prime Feb 17 '18 at 08:43
  • Depends on what library you are using. If you are using the mysql jdbc library directly you should be able to. See example in the answer to this question: https://stackoverflow.com/questions/1324641/cant-insert-byte-into-mysql-using-java – ilooner Feb 17 '18 at 18:29
  • it does not specify inserting into VARCHAR field. Or can you point to me to the correct segment. – prime Feb 17 '18 at 19:00
  • You are right, that example shows storing a byte array into a blob. I guess the only option is to compress with Deflater and then encode the byte array to a Base64 string as @laune suggested. Hopefully the Deflater would compresses the data more than Base64 encoding inflates it. You could try benchmarking this with a few data samples to verify that you achieve a desirable compression ratio. – ilooner Feb 18 '18 at 06:12