0

I need to store write compressed text into a QR Code, then scan and decompress it.

The source text are minified JSON strings: {"ref":"WR0001","customsType":"GIFT","insuredValue":20000,"weight":500,"shippingMethod":"EMS","sender":{"notificationEmail":"asd@asd.com","phoneNumber":"+818023459877","address":{"name":"TestName","companyName":"TestCompanyName","address1":"Testaddress1","address2":"Testaddress2","city":"Testcity","postalCode":"111222","region":"Tokyo","countryIso2":"ES"}},"recipient":{"phoneNumber":"+81231231","address":{"name":"John","companyName":"Salchichon","address1":"myhome","address2":"somewhere","city":"somecity","region":"someregion","postalCode":"111","codiceFiscale":"232323","countryIso2":"IT"}},"items":[{"quantity":1,"price":1000,"customs":"Somecustoms1"},{"quantity":2,"price":1500,"customs":"Somecustoms2"}]}

Deflate seems to generate binary text like this

óMMÉ,I,ÉÌÏSÈÈ/QHÎÈLÎNÍS(K,È/*O,JŠ”%gèf§Vò [...truncated]

Is there a compression algorithm that will render base64 text? I need something I can store in a QR Code.

If I simply apply base64 encoding to the compressed bytes, I get a string that's larger than my source text!

public partial class Main : Form
{
    public Main()
    {
        InitializeComponent();
        this.KeyPress += new System.Windows.Forms.KeyPressEventHandler(this.Main_KeyPress);
    }
    List<char> _barcode = new List<char>(1000);
private void Main_KeyPress(object sender, KeyPressEventArgs e)
    {

        _barcode.Add(e.KeyChar);

        // process barcode
        if (e.KeyChar == 13 && _barcode.Count > 0)
        {
            string msg = new String(_barcode.ToArray());
            MessageBox.Show(msg );
            Console.WriteLine(msg);

            //serializer
            MemoryStream stream = new MemoryStream(Encoding.Default.GetBytes(msg));

            byte[] compressedBytes;
            compressedBytes = Compress(stream);
            Console.WriteLine("compressed:" + Encoding.Default.GetString(compressedBytes));

            Decompress(compressedBytes);

            _barcode.Clear();


        }
    }

    private static Stream Decompress(byte[] input)
    {
        var output = new MemoryStream();

        using (var compressStream = new MemoryStream(input))
        using (var decompressor = new DeflateStream(compressStream, CompressionMode.Decompress))
            decompressor.CopyTo(output);

        output.Position = 0;
        Console.WriteLine("decompressed:" + Encoding.Default.GetString(output.ToArray()));
        output.Position = 0;
        return output;
    }


    private static byte[] Compress(Stream input)
    {
        using (var compressStream = new MemoryStream())
        using (var compressor = new DeflateStream(compressStream, CompressionMode.Compress))
        {
            input.CopyTo(compressor);
            compressor.Close();
            return compressStream.ToArray();
        }
    }
}

}

ChatGPT
  • 5,334
  • 12
  • 50
  • 69
  • You will do *very* well to find an algorithm that can actually compress a string that short, unless you've got a pre-shared dictionary of some sort (e.g. see shoco or smaz). Any compression algorithm adds overheads to the output, which then allows the compression to take place. With such a short input, the overheads are going to dominate. – canton7 Mar 22 '19 at 12:27
  • Base64 Encoding is not meant to compress your data. It is used to convert data to Codepage-independently data. – LittleBit Mar 22 '19 at 12:30
  • The example text fits QR code perfectly fine without compression. You can actually have pretty large QR codes. However, I feel this is not what you're trying to achieve. My guess: this is ur Bitcoin password recovery phrase. My guess #2: you want it not _compressed_ but _encrypted_. If so, encrypt with any algorithm -> base64 -> QR. – Mike Makarov Mar 22 '19 at 13:53
  • @canton7 the short string here is just a demo. The point isn't trying to compress such a short string; the point was getting the result in ASCII. My actual text will be 500-1000 characters and I believe should compress 4:1 – ChatGPT Mar 22 '19 at 21:34
  • @LittleBit The code is using deflate to compress the text. I was just using Base64 encoding to try and render into ASCII. Look again. – ChatGPT Mar 22 '19 at 21:36
  • The assumptions of @MikeMakarov are all incorrect. Yes, you can put thousands of characters into a QR code, but the more characters you have, the longer it takes for a 2D HID scanner to type it back into the machine. We have legitimate performance reasons to compress the text. It's for packge shipping information in a very high-volume shop. Text compressed 4:1 would take 1.4 seconds to "scan" compared to nearly 6 seconds due to the input speed. – ChatGPT Mar 22 '19 at 21:45
  • 1
    It's worthwhile making your examples realistic, as people tend to base their answers on them! If your actual strings are English text (and that wasn't just an unrepresentative example), then do look at the two algorithms I mentioned above. Also https://stackoverflow.com/questions/5220122/library-to-compress-text-data-and-store-it-as-text. You cannot take arbitrary binary and naively convert it to text, as many binary sequences are not valid text, in a given encoding. See yEnc and Base122 as more efficient alternatives to Base64. Also: https://en.m.wikipedia.org/wiki/Binary-to-text_encoding – canton7 Mar 22 '19 at 22:14
  • @MaxHodges Then I am afraid you have to come up with custom algorithm depending on the typical data format. Here's example https://stackoverflow.com/questions/1138345/an-efficient-compression-algorithm-for-short-text-strings but you might need to adapt the matrix. – Mike Makarov Mar 22 '19 at 22:23
  • Thanks @canton7, you are right. I edited the question to make it more clear. So basically all compression functions generate binary results? I [read](http://ed-von-schleck.github.io/shoco/) "You wouldn’t want to use shoco for strings larger than, say, a hundred bytes." – ChatGPT Mar 23 '19 at 00:53
  • Now that you've said the messages are json - perhaps look at a more efficient alternative, like msgpack or protobuf? – canton7 Mar 23 '19 at 07:22

1 Answers1

1

If your data compresses 4:1 as you mention in a comment, then encoding with Base64 will only expand it by 33%, leaving you with 3:1 compression.

You could instead use Base85 encoding, choosing 85 printable characters. That will convert four bytes to five characters, instead of three bytes to four characters, as Base64 does.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158