5

I have a string of data a bit over 800 characters that I'm trying to compress down to use on a QR code (I'd like at least 50%, but would probably be happy if I got it to less than seven hundred). Here's an example string I'm trying to compress, containing 841 characters:

+hgoSuJm2ecydQj9mXXzmG6b951L2KIl0k9VGzIEtLztuWO2On9rt7DUlH0lXzG4iJ1yK0fA
97mDyclKSttIZXOxSPBf85LEN4PUUqj65aio5qwZttZSZ64wpnMFg/7Alt1R39IJvTmeYfBm
Tuc1noMMcknlydFocwI8/sk2Sje5MR/nYNX0LPkQhzyi5vFJdrndqAgXYULsYrB3TJDAwvgs
Kw9C5EJnrlqcb21zg17O2gU/C8KY0pz9RPzUl1Sb0rCP8iZCeis4YbQ5tuUppOfnO/X0Mosv
SOQJ/bF9juKW8ocnQvNjsNxGV1gPkWWtiU2Old7Qm7FLDqL6kQKrq356yifs0NiMVGdvAg32
eugewuttCugoZASYOpQdwPu1jMxVO1fzF3zEy5w6tDlcfA2DZwa+un9/k8XZWAO/KVExy68q
UtVRQxsIOKgpl/2tNw5DBAKbykKIkmizbsA2xtzqnYqld4kOdNMJh3YjlqWF9Bt8MZo7a+Q6
jgayr2rjpyIptc599DGtvp68ZNQ64TKNmiMnnyGMo3E+xW34G3RrsYnHGm+xJoLKoOJhacDu
oZke1ycJgQv+Y61WPrvtFOVBxV5rvSzO0+8px5AWN3uCrrw1RmT5N14IVhh6BOtRjsifqIB2
dAKxzBNsvbXm1SzkuyqYiMnp5ivy3m2mPwc9GLsykx0FRIkhCYO8ins9E5ot9QvVnE155MFA
8FVwsP5uNdOF4EzQS2/h2QK3zb5Yq4Nftlo605Dd5vuVN/A7CUN38DaAKBxDKgqDzydfQnZw
R0hTfMHNLgBJKNDSpz2P6almGlUJtXT6IYmzuU2Iaion8ePG

I've already tried the following three libraries:

  1. The built-in .NET GzipStream
  2. DotNetZip, including,
    • GzipStream
    • DeflateStream
  3. The LZMA SDK from 7-zip

I'm running into an issue where the compression is actually making the string longer. My understanding was that DeflateStream had the least overhead, yet it's still adding characters on. Using DotNetZip, I told it to use maximum compression:

Imports Ionic.Zlib

Shared Function CompressData(data As Byte()) As Array

    Dim msCompressed As MemoryStream = New MemoryStream

    ' I'm not sure if the last parameter on this next function should be
    ' true (for LeaveOpen), but it doesn't seem to affect it either way.
    Dim deflated As DeflateStream = New DeflateStream(msCompressed, _
        CompressionMode.Compress, CompressionLevel.BestCompression, True)

    ' Write data to compression stream (which is linked to the memorystream)
    deflated.Write(data, 0, data.Length)
    deflated.Flush()
    deflated.Close()

    Return msCompressed.ToArray
End Function

I'm only thinking this is going to get worse as I'm going to have even more data. Is there some better compression algorithm for strings of this length? Does compression normally only work on longer strings? Unfortunately, the data is such that I can't use stand-in characters for pieces of data.

Also, am I able to use alphanumeric encoding for the QR code, or do I have to use binary? I don't think I can, per http://www.qrme.co.uk/qr-code-forum.html?func=view&catid=3&id=324, but I'd like to make sure.

Thanks for your help!

Sam Cantrell
  • 585
  • 6
  • 19
  • 1
    I think what you are doing is converting binary->string->compress->string for QR. You should skip the first string conversion part. – LostInComputer Aug 20 '11 at 06:16
  • I tried removing that first string conversion, however I haven't seen any change. That is, I create a string with the raw data. This data is passed to a subroutine where it's returned as a byte array. This byte array is then passed to the compression algorithm, which returns a byte array as well. I'm now converting both the compressed byte array and uncompressed byte arrays to strings to compare the length. Whichever string is shorter would then be passed to the QR code library, as it takes in a string for encoding. Thanks for your suggestion! – Sam Cantrell Aug 20 '11 at 06:30

4 Answers4

4

At first glance, it appears that you are trying to take some data and convert it into a QR code with this process:

--> encrypt --> base64 encode --> compress --> make QR code.

I suggest using this process instead:

--> compress --> encrypt --> make QR code.

When you want to both encrypt and compress, pretty much everyone recommends compress-then-encrypt. (Because encryption works just as well with compressed data as with uncompressed data. But compression usually makes plaintext shorter and encrypted files longer. For more details, see: "Can I compress an encrypted file?" "Compress and then encrypt, or vice-versa?" "Composing Compression and Encryption" "Compress, then encrypt tapes" "Is it better to encrypt a message and then compress it or the other way around? Which provides more security?" "Compressing and Encrypting files on Windows" "Encryption and Compression" "Do encrypted compression containers like zip and 7z compress or encrypt first?" "When compressing and encrypting, should I compress first, or encrypt first?", etc.)

"am I able to use alphanumeric encoding for the QR code, or do I have to use binary?"

Most encryption algorithms produce binary output, so it will be simplest to directly convert that to binary-encoded QR code. I suppose you could somehow convert the encrypted data to something that QR alphanumeric coding could handle, but why?

"Is there some better compression algorithm"

For encrypted data, No. It is (almost certainly) impossible to compress well-encrypted data, no matter what algorithm you use.

If you compress-then-encrypt, as recommended, then the effectiveness of various compression algorithms depends on the particular kinds of input data, not on what you do with it after compression.

What kind of data is your input data?

If, hypothetically, your input data is some short of ASCII text, perhaps you could use one of the compression algorithms mentioned at "Really simple short string compression" "Best compression algorithm for short text strings" "Compression of ASCII strings in C" "Twitter text compression challenge".

If, on the other hand, your input data is some sort of photograph, perhaps you could use one of the many compression algorithms mentioned at "Twitter image encoding challenge".

Community
  • 1
  • 1
David Cary
  • 5,250
  • 6
  • 53
  • 66
3

This answer is related to Guffa's answer. He said that QR code can accept binary data and it must be a limitation of the library you are using.

I looked at the source code of the library. You call the Encode function right? This the contents of the encode function

public virtual Bitmap Encode(String content, Encoding encoding)
{
    bool[][] matrix = calQrcode(encoding.GetBytes(content));
    SolidBrush brush = new SolidBrush(qrCodeBackgroundColor);
    Bitmap image = new Bitmap( (matrix.Length * qrCodeScale) + 1, (matrix.Length * qrCodeScale) + 1);
    Graphics g = Graphics.FromImage(image);
    g.FillRectangle(brush, new Rectangle(0, 0, image.Width, image.Height));
    brush.Color = qrCodeForegroundColor ;
    for (int i = 0; i < matrix.Length; i++)
    {
        for (int j = 0; j < matrix.Length; j++)
        {
            if (matrix[j][i])
            {
                g.FillRectangle(brush, j * qrCodeScale, i * qrCodeScale, qrCodeScale, qrCodeScale);
            }
        }
    }
    return image;
}

The first line (encoding.GetBytes(content)) converts the string to bytes.

Get the source code then modify it to have this function: "public virtual Bitmap Encode(bytes[] content)"

LostInComputer
  • 15,188
  • 4
  • 41
  • 49
2

The compression works by removing redundancy in the data, but the string seems to contain random/encrypted data, so there is no redundancy to remove.

However, it's data encoded using base-64, so each character only carries six bits of information. If you keep the binary data instead of base-64 encoding it, it's only 631 bytes.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • How would I keep it as binary data? The QR code library I'm using (MessagingToolkit.QRCode) encodes the data as a string. Wouldn't I have to convert the byte array to a string to encode it as the QR code? (Sorry for my ineptitude.) Is there some mechanism that keeps it as binary data while still making it a string? I've been using Convert.ToBase64String on the byte array to create a string that can be encoded. How would I do it the way you suggest? Thanks! – Sam Cantrell Aug 20 '11 at 06:36
  • The QR code supports binary, so that would be a limitation in the library that you use. Anyway, you would be better off doing `binary -> compress -> base64 -> string` instead of `binary -> base64 -> string -> binary -> compress -> binary -> base64 -> string`. – Guffa Aug 20 '11 at 06:58
  • If you don't go with binary, base64 is suboptimal by a fair amount. Base64 includes lowercase letters and those aren't included in the QR character set for alphanumeric so the encoder is still going to use 8 bits per base64 character and thus you're throwing away two bits. – smparkes Aug 20 '11 at 15:37
  • Sorry ... meant to edit that (I hate SO doing a submit when I hit return.) Just wanted to add that alphanumeric in QR is A-Z0-9 $%*+-./: for 45 symbols. But I don't have any references to encoders that can encode to a given symbol set size. I'm sure they exist, but may be obscure. You could encode in base32, but then you're throwing away symbols on the other side and it may be a wash. – smparkes Aug 20 '11 at 15:50
2

You are comparing different compressors. The Zip-family usually use a statistical compression and the LZ-family an acronym for Lempel-Ziv is a dictionary compression to remove the redundancy in the input text. So, compression works by removing superflous informations. It works good on text files and images, not so good on audio, video and program files. For the latter there is lossy compression but not for program files. Given your example string it contains too much entropy to be compressed well. You can calculate the information entropy with -log(p)+log(2) where p is the probability of the character that occurs in your text. See also information theory and shannon-theorem.

Micromega
  • 12,486
  • 7
  • 35
  • 72
  • As can be guessed from [one of the asker's previous questions](http://stackoverflow.com/questions/6448226/using-rjindael-and-rsa-to-encrypt-data-stored-in-qr-code-in-visual-basic-net) the data is in an encrypted form, so the data is expected to contain a high entropy and a lossy compression algorithm would be harmful to the encrypted data. – Peter O. Aug 20 '11 at 08:42
  • No upvote? Did you understand what I wrote? I don't think I've spoke to you because I didn't suggest a lossy compression I wrote BUT NOT FOR PROGRAMS FILES. Should I clarify this? – Micromega Aug 20 '11 at 08:52
  • Sorry, I misunderstood "program files" to mean programs, that is, files that contain machine code. The encrypted data given by the asker is neither text, images, audio, video, or "program files" as I understood it. – Peter O. Aug 20 '11 at 08:54
  • Well, maybe my answer is useless but it doesn't contain wrong or harmful information. An upvote would be nice. A downvote and I will delete my answer. – Micromega Aug 20 '11 at 08:57