Compression of a very short txt file (consists of strings only) C#

Question

I want to compress a txt file of 1.7Kb that has only strings of numbers in it. The data is just reading of current at different voltages(100 entries) Want to compress and write it in a smart card that has memory only 512 bits. Could anyone help with the compression techniques available in C#. I have tried gzip n Lzma n common techiques such as difference mechanisms n all but i could reach only upto 1kb. Please provide some solution in C#

100 entries in 512 bits => 5 bits/entry. Gives you 0..31 as a range. — H H, Jul 19 '12 at 09:10
Do you actually mean 512 bits or 64 bytes? Not 512 bytes or kbytes? — Jodrell, Jul 19 '12 at 09:15
Please describe the data better and check for bit/byte typos. What devices share that card, common libs, ... — H H, Jul 19 '12 at 09:22
1.7K is short enough that you can post an example here. Please do so. — Mark Adler, Jul 19 '12 at 16:00
You must tell us how many different values are possible for each number. If there are more than 32, and the values don't follow any other kind of rule, it's impossible. — reinierpost, Jul 19 '12 at 16:50

score 1 · Answer 1 · edited May 23 '17 at 10:09

1

The reason why GZipStream gave you a larger file than you expected is that GZipStream produces whole archive files, not just the compressed equivalent of the input. Use DeflateStream instead and you will compress to a fraction of the size, using the exact same algorithm.

Edit#2: This will however save you no more than some 144 bits, and it is not good enough for you. The compressed file is so big for a small file because the Huffman table is constant size with Microsoft's flawed implementation. DotNetZip would have the same format but not the same problem. Or you can use SharpZipLib which supports one other interesting algorithm (format) as well (bzip2); use SetLevel(9) to force maximum compression level that the library can give you.

An excellent explanation of why Microsoft compression worked so badly for you and why DotNetZip or SharpZipLib can do much better even with the same format (basic algorithm), is in this answer by Mark Adler.

edited May 23 '17 at 10:09

Community

1
1

answered Jul 19 '12 at 09:11

Jirka Hanika

13,301
3
46
75

1

If he is using Microsoft's GZipStream or DeflateStream, they have a serious bug (actually several serious bugs) that cause them to be extremely inefficient on short data streams. (See my answer at http://stackoverflow.com/questions/11435200/why-does-my-c-sharp-gzip-produce-a-larger-file-than-fiddler-or-php/11435898#11435898 .) Those should not be used. DotNetZIp's replacements are in fact much better. So DotNetZip would _not_ have as bad of a problem. Also the "zip" algorithm _is_ the deflate algorithm. You seem a little confused on what those are. – Mark Adler Jul 19 '12 at 16:03
@MarkAdler - I'm learning this myself. I did not realize several key facts in my first (next-to-last) edit. Fixed. – Jirka Hanika Jul 19 '12 at 16:49

Kek · Answer 2 · 2012-07-19T09:34:41.013

0

A solution could consist in storing data as binary => 100 entries, 4 bytes/entry => 400 bytes. Then, maybe, you could compress the result.

List<float> myNumbers = ...
MemoryStream ms = new MemoryStream();
using(BinaryWriter bw = new BinaryWriter(stream))
{
    foreach(var n in myNumbers)
    bw.Write(n);
}
ms.Seek(0, SeekOrigin.Begin);

// Read the first 20 bytes from the stream.
byteArray = new byte[ms.Length];
count = memStream.Read(byteArray, 0, ms.Length);
File.WriteAllBytes(path, byteArray);

And to read:

byte[] content = File.ReadAllBytes(path);
var ms = new MemoryStream(content);
List<float> result = new List<float>()

using(BinaryReader br = new BinaryReader(ms))
{
    result.Add(br.ReadSingle());
}

edited Jul 19 '12 at 09:34

answered Jul 19 '12 at 09:21

Kek

3,145
2
20
26

but if a float is 4 bytes or 32bits, as the question stands, thats enough space for 16 numbers. – Jodrell Jul 19 '12 at 09:38
Yes, you right... This is the best compression I could think of... hoping for an OP mismatch between bit and byte. Otherwise, you may ask Gandalf a hand :( – Kek Jul 19 '12 at 09:46

score 0 · Answer 3 · answered Jul 19 '12 at 16:14

512 bits for 100 entries means about 5 bits per entry. The only way you're going to approach something like that losslessly (which I assume you need) is if the data has some significant predicability from sample to sample, and so the difference between the prediction and actual is small enough to be coded on average in 5 bits. Otherwise there is no hope.

I'm sure that you can compress it much smaller than 1.7KB. If it is really only digits (though I'd wonder what incredible measuring device you have that requires 17 digits per sample), then you should be able to get it down to around 700 bytes.

If you represent your samples with their actual accuracy, then you should be able to get the digits down quite a bit. Perhaps five digits per sample? Then you could get closer to 200 bytes. Still a long way though from 64 bytes (512 bits).

score -1 · Answer 4 · answered Jul 19 '12 at 10:11

-1

You can use 7ZipSharp library. It's very effective:)

answered Jul 19 '12 at 10:11

Nickon

9,652
12
64
119

Compression of a very short txt file (consists of strings only) C#

4 Answers4