-1

Is it possible to Encode a string in a certain way to minimize the number of bytes? basically i need to get 29 characters down to 11 bytes of data.

var myString = "usmiaanzaklaacn40879005900133";
        byte[] bytes = Encoding.UTF8.GetBytes(myString);

        Console.WriteLine(bytes.Length); //Output = 29, 1 byte per character
        Console.ReadKey();

This shows when encoding with UTF8 that 29 character string results in 29 Bytes... i need 29 character string resulting in 11 bytes or less.. is this possible? I was thinking i could possible have some sort of lookup or binary mapping algorithmn but i am a little unsure on how to go about this in C#.

EDIT:

So i have a Chip that has a custom data payload of 11 bytes. I want to be able to compress a 29 character string (that is unique) into bytes, assign it to the "custom data" and then receive the custom data bytes and decompress it back to the 29 character string... now i dont know if this is possible, but any help would be greatly appreciated.. thanks :)

the string itself [usmia]-[anzakl]-[aacn40879005900]-[133] = [origin]-[dest]-[random/unique]-[weight]

Ok the last 14 characters are integers.

I have access to all the Origins and Destination... would it be feesable to create a key value store have the key as the "Origin e.g. usmia" and the value is a particular byte.. i guess that would mean i could only have like 256 different Origin and Dests and then just make the the last 14 characters an integer??

DaveHutchy
  • 57
  • 6
  • 1
    Why does it need to be 11 bytes? – Andrew Jenkins Sep 08 '16 at 03:28
  • Because i only have a maximum of 11 bytes to play with... so it can be less but a max of 11 – DaveHutchy Sep 08 '16 at 03:30
  • http://stackoverflow.com/questions/7343465/compression-decompression-string-with-c-sharp – Rob Sep 08 '16 at 03:30
  • @Rob Using the GZipStream results in more bytes 47 to be exact – DaveHutchy Sep 08 '16 at 03:34
  • 1
    You might be able to exploit some restrictions on the legal characters that you'll be encoding. If you restrict yourself to only 10 digits and 26 alphabetic letters (36 legal characters), you could compress 17 characters into 11 bytes, still shy of the 29 you're after though. – Curtis Lusmore Sep 08 '16 at 03:50
  • 2
    It is still not clear why you think it should be possible to encode this in 11 bytes. This is basic information theory. 11 bytes represents 88 bits of information. If your text has more than 88 bits of information in it, it is _impossible_ to get it down to 11 bytes. Can you provide any rationale that would make it worth it for anyone to pursue this question as a potentially solvable one? – Peter Duniho Sep 08 '16 at 04:13
  • 1
    Storing the actual data somewhere else and using an <11B key is the only way I'd know how to do it. If you're dealing with transmission, you'd have to break it down. Decomp algos don't start to make sense until the data gets much larger, and still can't usually realize that kind of shrink. Repeating characters in the same position could be removed, but otherwise, there is simply no way to stuff an elephant into a shoebox. – Shannon Holsinger Sep 08 '16 at 04:24
  • @Peter Duniho I have a little chip that only has an extra payload of 11 bytes of custom data.. well 12 bytes but 1 byte for length... and i want to be able to put a 29 character string into this custom data... it may not be possible that is why i was asking if anyone here has any ideas? thanks :).. basically i want to get the string compress it into 11 bytes put it on the chip and then receive the bytes decompress back to string – DaveHutchy Sep 08 '16 at 06:19
  • It's just not possible to encode arbitrary data that much, not with so little repetition. There's too much information in the string itself. As @Shannon points out, if you have a limited possible input and can store the actual values somewhere else, indexing them in your 11 bytes, then that would work. But that would require pre-computing everything. Given the details you've provided so far, you're asking for the impossible. If you want it to be possible, you need to add details that would make it possible (i.e. constrain the problem significantly). – Peter Duniho Sep 08 '16 at 06:24
  • @PeterDuniho i have updated the question to have more information about the requirements. thanks :). like you say i could i guess have like a key-value store for each of the components that make up the string have a byte value that represents all combinations... or something like that?? – DaveHutchy Sep 08 '16 at 06:47
  • It's still not clear what the constraints are. Knowing the string ends in 14 digits is useful; a numeric value can be encoded very efficiently. But unless the alphabetic part can also be significantly constrained, you're still not going to fit in 11 bytes. If it can be significantly constrained, you need to explain how. – Peter Duniho Sep 08 '16 at 06:58

1 Answers1

0

15 lg(26) + 14 lg(10) ~= 117 bits ~= 14.6 bytes. (lg = log base 2)

So even I was optimistic and assumed that your strings were always 15 lower case letters followed by 14 digits, it would still take a minimum of 15 bytes to represent.

Unless there are more restrictions, like only the lower case letters a, c, i, k, l, m, n, s, u, and z are allowed, then no, you can't code that into 11 bytes. Whoops, wait, not even then. Even that would take a little over 12 bytes.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Ok thank you. i wasn't quite sure whether it was possible or not. Yes you are correct they will always be 15 lowercase letters followed by 14 digits. basically [usmia]-[anzakl]-[aacn40879005900]-[133] = [origin]-[dest]-[random]-[weight] – DaveHutchy Sep 08 '16 at 06:38
  • 1
    Well if USMIA and ANZAKL are warehouses or airport codes, shorten them using a cipher on both sides of the transaction. USA-MIAMI becomes a, USA-JFK becomes b, ASTRALIA-AUKLAND becomes z. Using Hex, you could code 16 airports into 2 Bytes. If you ONLY have certain combinations of airports, you can go farther. Knowing what the data is is important. – Shannon Holsinger Sep 08 '16 at 12:04