1

(Edited the title because I had no idea what I was looking for and it was misleading.)

Edit: What I was looking for was binary to string and back again. I have answered my own question below.)

Original Post: I'm trying to make a retro-style password system for a game made with JavaScript. (like on an old NES game for example that uses alpha-numeric characters to load the level you were on or all the flags pertaining to that level.)

I've gotten so far as generating a string of flags (all numeric) and then loading that string later by sorting through those flags with regex and then putting them back into my gamestate object (with objects in it that hold all my various flags).

Each flag is a number between 0-9 and each object (or group of flags) are 8 characters long. (often with leading zeros, so these groups are always 8 characters long)

A typical string could look like this:

var gameStr = "000102340000001000019531";

(shown in groups to illustrate them individually)
00010234
00000010
00019531

(3 groups of 8 characters for example)(24 characters long) (but will probably end up having upwards of 25-ish groups of 8 when the game is finished)

As you can imagine, this number is going to get pretty long and obviously wouldn't work as a user inputted password.

So I started looking online for ways to compress this number.

I'm hoping to compress it to something a user could easily copy and paste into a tweet or a chat message, something that doesn't look too "ugly" and isn't too long (I don't know, I'm not being picky here, it could be anywhere between 6-24 characters?) and I don't mind if it's easily unencrypted- security is not important for this use case. If necessary, I would be willing to change the rules, for example the way the numbers are stored like groups of 4 flags/digits each. I'm merely looking for a way to make this number smaller either mathematically or through some kind of compression algorithm.

I came across two seemingly promising solutions,

The first was this JavaScript library called lz-string It's like LZW but faster and more specific, it compresses a string into a hex code that looks something like:

Input: 
000102340000001000019531000102340000001000019531000102340000001000019531
(72 characters)


(9 groups of 8 characters separated just to visualise the numbers in their groups)
00010234
00000010
00019531
00010234
00000010
00019531
00010234
00000010
00019531


Output:
0803c08c199858808e5f807a059c936258190d2c2d438ec6b43c4c5d0080
(spaces removed)(60 characters)

But as you can see, the hex is still pretty long.

So the second solution I found was this quiet answer tucked away on SO:

Jamie Morgan:

What about converting a big number to a formula: So instead of 21312312312 I might use 4^34

(they provided a link to some math forum, but the link is dead.)

And this, in my head, seems like it could work, but I don't have the knowledge to know how to even start writing such a function that could do that.. (math REALLY isn't my strong suit..) this idea seems like the math equivalent of "unfrying an egg"..

So my question is, Any ideas on how I can shorten this number or compress it into a compressed number (or string) and then back again?

As an aside, I would like to mention that I've spent almost a week googling and looking at other answers in SO to this kind of question and so far I'm beginning to think it might just be impossible.. if you have reason to believe that it is impossible, please just tell me so I can stop looking for an answer.. I could easily just save this data to the browser's localStorage and be done with it, but I thought a password system would be more authentic and a fun challenge to incorporate and to learn a little about working with compression and numbers in this way.

Thank you in advance for your understanding and any help you may be able to provide.

Partack
  • 886
  • 11
  • 24
  • @CertainPerformance No, the user would recieve the compressed string and be able to put it into the game later to load the game's state. I do not wish to store the data locally or serverside. I wish for the string to be the storage medium it's self. – Partack May 08 '21 at 15:08
  • You could permit characters other than hex to make the string shorter, but then it could well look really weird. I'd stick with lz-string if I were you, I think – CertainPerformance May 08 '21 at 15:10
  • @CertainPerformance I'm not above using special characters and uppercase and lowercase for the compressed string. that would be absolutely fine if it reduced the size of the compressed string, my question pertains to *how* to compress it. – Partack May 08 '21 at 15:11
  • 1
    There are two distinct steps, 1) compression into (hopefully a smaller number of) bytes, and 2) encoding these bytes into a displayable and copyable string. It's possible to combine them in a compression scheme that works directly with the output alphabet but I think that makes it all more complex. You should try a real lzw library that compresses to bytes, and then use the `btoa()` function to convert the byte string to base64. That should provide a relatively easy programming path, and hopefully the compression will be adequate. Otherwise you'll need something more complex. – President James K. Polk May 08 '21 at 15:30

4 Answers4

1

Where does the initial state that you want to compress come from? I guess there are three likely options.

  1. It's random. Most likely that means some code seeded a pseudo random number generator using some value like e.g. the time of the day, then used that to produce the values. In this case, you could get your hands on the seed (which most likely would be a fairly short number) and use that as the identifier from which everything else is computed. Make sure to use a portable random number generator with well-defined deterministic behaviour, e.g. some Mersenne Twister implementation. The JavaScript built in number generator is implementation-defined so it does not fit this bill.

  2. It came from some catalog made by the game developer (i.e. you). Then just obfuscating the index into that catalog might be good enough.

  3. It came from some user hand-tuning the values. In this case you're out of luck, since as I understand the problem chances are that any possible combination could get entered. You can't compress a large set of values to a smaller set of values without losing information.

There might be middle grounds. You could have a randomised setup that subsequently got hand-tuned, and the description as initial seed plus a few modifications would be shorter than the full set of settings. Or the hand-tuning would only be allowed following specific rules set out by the game developer, which again would make for a limited set of possible values and a potentially shorter encoding. Thinking along these categories might help you analyze your own situation and find a suitable solution.

You can also look at this from an information theory point of view. You can't expect to encode a sequence of fully independent and uniformly distributed digits with less information than those digits, perhaps expressed in some other base or whatever. You can compress data if there are patterns to it that make some combinations more likely than others. The more you tell us about these patterns, the better we might be able to advise. In total you can't get below the entropy of the source (i.e. game state distribution), so estimating that might help you find a lower bound for what to expect.

MvG
  • 57,380
  • 22
  • 148
  • 276
1

Step 1 would be to get rid of the leading zeros. For each of these groups of eight digits, there appears to be a range of possible values far less than eight digits. What do you know, or can control, about the range of numbers in each group?

Step 2 would be replacing repetitions with a reference. Why are the same values repeated? Is that expected? Or is it an anomaly of your example?

Step 3 would be encoding. If you have an arbitrary sequence of n decimal digits, you can encode it into ceiling(n log 10 / log k) symbols, where k is the number of allowed symbols. For example, if you allow all digits, lower case, and upper case numbers, k is 62. You could add most of the punctuation characters, and get that up in the 80's or 90's. You can do this simply with base conversion. All you're doing is converting the number from base 10 to base k. This step will give you close to a factor of two compression.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
1

If anyone besides OP is looking for such a Javascript thing, a very efficient way to do it would be something like BWTC32Key, in which Base32768 is used as the binary-to-text encoding for maximum efficiency, and an improved version of the BZip family is used for high efficiency compression. (It also has optional AES256-CTR encryption, which is nice because it doesn't need padding.)

stgiga
  • 21
  • 4
0

(Answering my own question)

I think I've settled on a solution.

At first I thought what I wanted might have had something to do with Base Number conversions (for example binary (base2) to hex (base16) or, base2 to base64, etc.) and the use of the Javascript functions btoa() and atob() (after more googling based off of President James K. Polk's and CertainPerformance's comments and MvG's and Mark Adler's eloquently written answers) (thank you, all of you, I appreciate your help),

I eventually found the website cryptii.com which has a very clean and easy to understand string, number, bit, byte (and anything else you might need) converter.

This website helped me visualise and understand exactly how these conversions work on a hex, binary and base number level.

As it turns out, (and from what I understand), for what I'm trying to do I don't need to convert to different bases or compress using LZW because they just make a larger string of characters than what I can already make with a few bytes in binary.

One byte, (a group of 8 boolean flags / binary bits) is equal to one character in a string.

So if my game state object has 14 groups of 8 boolean flags (14 bytes of binary) I can convert those bytes to a string of 14 characters.

Here's what I mean:

this random set of binary for example: 0110111001100101011100010111010101101011011000110110101101100010011101100110110101100001011011100110011001101101

is the same as: nequkckbvmanfm

I found a small Javascript function called ABC that converts binary to a string and back

  let ABC = {
    toAscii(bin) {
      return bin.replace(/\s*[01]{8}\s*/g, function(bin) {
        return String.fromCharCode(parseInt(bin, 2));
      });
    },
    toBinary(str, spaceSeparatedOctets) {
      return str.replace(/[\s\S]/g, function(str) {
        str = ABC.zeroPad(str.charCodeAt().toString(2));
        return !1 == spaceSeparatedOctets ? str : str + ' '
      });
    },
    zeroPad(num) {
      return '00000000'.slice(String(num).length) + num;
    }
  };

and I can use it like this:

//..convert my object full of boolean flags to a string with a loop.. (not shown here)

//a string named gameState would contain something like:
//0110111001100101011100010111010101101011011000110110101101100010011101100110110101100001011011100110011001101101

//binary to ascii
var testAsciiResult = ABC.toAscii(gameState); // Result --> nequkckbvmanfm

//back into binary
var testBinaryResult = ABC.toBinary(testAsciiResult,0);

// Result --> 0110111001100101011100010111010101101011011000110110101101100010011101100110110101100001011011100110011001101101

//(the ",0" flag outputs binary without the spaces)
//(don't use a flag if you want to output binary with spaces every 8 bits)

As an aside to the usage of this, I can also convert any combination of these bytes (or less than a byte depending on the maximum number) of binary to base10 (normal numbers) instead of individual flags from the binary string so that I can store the amount of gold, or the state of the main quest, or what level something is.

It should be noted that a lot of the outputted symbols from an entire byte might not be a character a user can type (or see, such as a space or a new line) (As documented on this website I found on google for "ascii special characters")

To make sure only user-readable characters appear in the password, it would be possible to store flags in JavaScript in groups of 6 bits instead of 8 so that my password generator only outputs characters between 64 - 127 (going off the chart on that website) and then when creating the password for my game, add a 01 at the beginning of each set of 6 digits to bring the sets of 6 boolean flags back up to a byte so that it can properly assign the correct character.

I Also found it's important to note that character 127 (01111111) is a
"delete" character (shows as a blank space in browser) and I imagine I would have to convert this to a different symbol in my password outside the range of 64-127 (the ? character for example, character 63, 00111111) and then check for it again when the password is loaded to put the data flags back to what they should be.

But this falls outside the scope of this question/answer.

This must sound terribly trivial to anyone who was taught this in college or something, but I'm learning this stuff for the first time and it's a wonder how I've gone so long working with JavaScript without knowing this stuff until now. I'm sorry for how wordy this question and answer has become, but it was difficult to be concise and I hope this information helps someone else stumbling in from google some day trying to do something similar.

I think I had a lot of difficulty explaining my self in this particular matter mainly because I didn't know what anything was called and I was struggling to even wrap my head around how strings, hexadecimal and binary work together.

I think I have a lot of reading ahead of me.

Partack
  • 886
  • 11
  • 24
  • 1
    That does not look at all like an "arbitrary binary" sequence. The probability that a random set of 112 bits would represent 14 lower-case characters is about one in a _hundred trillion_. In general, an arbitrary binary sequence is not even printable and certainly not typeable. And, no, eight bits is not a unicode character. There are several unicode representations, but none of them permit all possible byte values. There are far more than 256 unicode characters (think about Chinese). Each unicode character is represented as either a variable-length or fixed-length sequence of bytes. – Mark Adler May 09 '21 at 19:30
  • If you expect help, you need to state in your question _exactly_ the nature of the information you are trying to encode. Is it actually a sequence of 112 bits, where any bit is equally likely to be a 0 or a 1? – Mark Adler May 09 '21 at 19:34
  • As mentioned previously, my problem stems not only from a lack of knowledge in this area, but also that I am unable to identify the correct terms or words to convey my meaning. Either way, I have found a method that correctly works in my code REGARDLESS of what these actual terms are called and I intend to close this question since a solution has been found that works for me. I have shown in this answer what I was looking for and have no further questions. – Partack May 09 '21 at 19:40
  • 1
    Out of pure curiosity, how come that the original question used different decimal values for different "flags", when your own answer can apparently treat them as single bits? Also, you really do need to make sure you don't encounter bytes that aren't valid and easy to type characters. A set of between 62 and 95 is common, e.g. as in https://en.wikipedia.org/wiki/Ascii85, but you'd need 256 for arbitrary groups of 8 bits. Lastly, readers who don't have a sufficiently small data range for some of their quantities might be interested in https://en.wikipedia.org/wiki/Variable-length_quantity. – MvG May 10 '21 at 05:48
  • The reason there are strange numbers higher than 1 in the original question is because at the beginning I didn't know what I was looking for or how the system would have to store digits to make it all fit together. I wasn't originally using binary, I was thinking with strings in mind. but I did say that I would be willing to "change the rules" to make it work. Which is what I did. Turns out binary is really good at storing the numbers I needed, I just didn't know how they related or how to work with binary. Those VLQ's look like they'd be useful and I will study them. Thank you for your help. – Partack May 10 '21 at 13:58