-5

Before we start, I want to say "hash" is a bit of a misnomer from what I actually want.

Basically, I have a program that returns a 92 character string (this is cryptographically secure) that I want to shorten, which is why I can't think of any other word but I'll need to be able to reverse it.

So I'm looking for some way that I can take the 92 character base64 string (s) and turn it into a much shorter string (n), and then reverse it.

So the encoding would be like (n) + (hash function) = (s) And then I'll be able to decode it with (s) + (hash function) = (n). I don't need this to be secure since I handled that when generating the string.

I was using Base65536 but that was mostly for a quick joke since that would be impractical for an actual user.

TL;DR - I need a hash (or encryption) function that will generate short strings out of long ones.

Just to clarify, I do NOT need to compress the file size, I need a shorter string to return to the user.

Matt
  • 109
  • 1
  • 10
  • Turning from base64 to bytes will reduce its size significantly. – Dragonthoughts Mar 08 '18 at 22:36
  • @Dragonthoughts will it reduce the size or the length? I need to shorten the string and not necessarily reduce the size. – Matt Mar 08 '18 at 22:37
  • 5
    Think about what you are asking. You want to store x bits of information using y bits where y < x **and** want to be able to reliably reverse it. You can't do this without collisions. – Luke Joshua Park Mar 08 '18 at 22:38
  • It will take up fewer bytes. – Dragonthoughts Mar 08 '18 at 22:39
  • 1
    @LukeJoshuaPark That really is the answer to this question. You should post it as one – BradleyDotNET Mar 08 '18 at 22:40
  • Maybe check [this answer](https://stackoverflow.com/questions/1443158/binary-data-in-json-string-something-better-than-base64/1443240) which concludes that you can't really improve much on Base64 if you're storing the data in a string. – John Wu Mar 08 '18 at 22:43
  • @LukeJoshuaPark Yeah, I have thought about collisions, although I was unaware if there was any work around. – Matt Mar 08 '18 at 22:44
  • @JohnWu Is there a better way then _not_ in a string? – Matt Mar 08 '18 at 22:45
  • Yes, a byte array. Unless you don't intend to use all 8 bits. – John Wu Mar 08 '18 at 22:46
  • Just store the raw bytes, that's always going to be your best bet for small amounts of data. – Luke Joshua Park Mar 08 '18 at 22:46
  • Possible duplicate of [Compression/Decompression string with C#](https://stackoverflow.com/questions/7343465/compression-decompression-string-with-c-sharp) – John Wu Mar 08 '18 at 22:53
  • 1
    Why must you shorten your data? Can you not store the full data locally and return a shorter key to users which they can use to retrieve the real data? That is how URL-shorteners work. – Dour High Arch Mar 09 '18 at 00:31

1 Answers1

-1

The most space efficient way to store binary data is to store it as bytes. The only way you may get it even shorter is via compression. But for 92 Characters that will not amount to much.

As for Base64: There are cases where we are forced to transmit binary data over a medium not supporting random binary data. Mostly Textbase media (Email, XML files, HTML). So we use Base64 as a way to encode Binary Data. While it is lossless, it is less storage efficient. In effect every Byte of Input needs 1 1/4 byte in Base64 Output. It is never the ideal case to use Base64, more a nessesary evil.

Christopher
  • 9,634
  • 2
  • 17
  • 31
  • The actual file size of the string is not much of a concern. I'm more worried about the length. – Matt Mar 08 '18 at 22:54
  • 1
    Also; compression algorithms take advantage of data *not* being random (run-length, patterns, etc). Hashes are *intended* to be random and so the benefit gained will always be small – BradleyDotNET Mar 08 '18 at 23:02
  • @Matt: You are not making much sense. How does Lenght mater, but not how much data is stored on the disk? Maybe you are using the wrong terms? For me: Lenght of a String = Number of Characters. Size of a string on Disk = Number of Bytes you need to store it, depending on Encoding. – Christopher Mar 08 '18 at 23:06
  • @Christopher The string I am returning to the user is 92 characters long. They are then expected to use that string elsewhere in the program. I want them to be able to use a much shorter string. The entire string in a .txt documents takes up 90 bytes. That is such a small amount of space that I am not concerned about compression. I just want the user to be able to more easily input the string when they need to, and for that, I want a string which is shorter than the original one. – Matt Mar 08 '18 at 23:11
  • You can not make any type "shorter" without loosing data. You could make "100.001" shorter by cutting off the decimal places "100", but that would loose you some data. If you find the string is too long for the user, either give support for Copy and Paste or do not even give the string to the user. Have it stored in some backend datasource wich has a Index/Primary key. Only give the user said primary key. "Customer Number" is really just a Primary key someone invented because "writing down everything about the customer every time" was annoying. – Christopher Mar 08 '18 at 23:16