8

I want to compress a string(an XML Document) in Java and store it in Cassandra db as varchar. I should be able to decompress it while reading from db. I looked into GZIP and lz4 and both return a byte array on compressing.

My goal is to obtain a string from the compressed data which can also be used to decompress and get back the original string. What is the best possible approach?

Mritunjay
  • 25,338
  • 7
  • 55
  • 68

2 Answers2

3

I don't see any good reasons for you to compress your data: Cassandra can do it for you transparently (it will LZ4 your data by default). So, if your goal is to reduce your data footprint then you have a non-existent problem, and I'd feed the XML document directly to C*.

By the way, all the compression algorithms take array of bytes and produce array of bytes. As a solution, you could apply something like a base64 encoding to your compressed byte array. On decompression, reverse the logic: decode base64 your string and then apply your decompression algorithm.

xmas79
  • 5,060
  • 2
  • 14
  • 35
0

Not enough reputation to comment so posting as an answer. If you want a string back, then significant compression will depend on your data. A very simple solution might be something like Java compressing Strings but that would work if your string is only characters and no numbers. You can modify this solution to work for most characters but then if you don't have repeating characters then you might actually get a larger string than your original one.

Community
  • 1
  • 1
clinomaniac
  • 2,200
  • 2
  • 17
  • 22