2

[EDIT] I have completely rephrased the question to try to be more concise and clear

I am looking for a 1-1 function, encode, such that

  • encode( 32_bytes_of_data ) => {w_1, w_2, ..., w_n}, where:
    • w_1 ... w_n are real English words
    • n should be reasonable - I don't want 256 words to encode 256 bits

Ideally: - n should not be the same for all input values, but it is not a super important requirement.

The goal - to make public keys more human readable and recognisable.

willcode.co
  • 674
  • 1
  • 7
  • 17
  • Why not just post a link to something like [Pastebin](http://pastebin.com/)? It's easier for both you and the key recipient – loopbackbee Jan 28 '13 at 14:01
  • 1
    Nope I would like a text only solution. Links could easily and often are removed from posts by the evil admin. – willcode.co Jan 28 '13 at 14:07
  • Then I'm guessing this question reduces to "what steganographic techniques exist to conceal a 256 bit message into a ASCII/UTF-8 media with a maximum size of a few kilobytes"? If it does, maybe it's better to just ask that? – loopbackbee Jan 28 '13 at 14:12
  • Agree that the question was not very well posed, but im not really looking for steg. Just an encoding. My goal is not to really hide the existence of the key from humans. I will edit and rephrase the question – willcode.co Jan 28 '13 at 14:19

3 Answers3

3

If you're not worried about manual inspection and only seek to protect against obvious regexes, there's a couple of alternatives, with increasing annoyance-factor:

  • ROT13 has been used on usenet countless times for this kind of thing. It would defeat base-64 detection

  • Use the 256 bits as an integer, and use its base-10 representation in ASCII. It'll look like this: 115792089237316195423570985008687907853269984665640564039457584007913129639936

  • you can encode previous number into a look-and-say sequence and spell it: two one, one five, one seven...

  • Encode the 256 bits into a base-26, and use the encoded 26 letters of the alphabet as the first character of each word on a phrase. You'd need about 55 words. If you're feeling creative, you may use the first two characters of each word, and reduce that to 27, but you may have to use very strange words indeed. If you don't care about appearance, just post the 55 characters: ennjuuzflkeenzhszxamvlrnusvcpknavbgzllukzllrkvatszirbkq

  • If you want to use unicode, there's 110,000 different characters. Assuming only half of those are printable, it's a bit more than 15 bits of entropy per character, so you'll need 17 characters to encode 256 bits

  • If you and your recipient can pre-share any amount of data (you must at least share knowledge about the "steganographic" method), you could assign a numeric value to each word in a dictionary. There's about 1,000,000 words in the English language, so each one has about 20 bits of entropy. You'll need 256/20=13 words. Bonus points for generating a key which encodes to correct syntax and grammar and rewriting Jabberwocky

Community
  • 1
  • 1
loopbackbee
  • 21,962
  • 10
  • 62
  • 97
  • Thanks there are lots of great ideas here. I need to generate real English words so I guess I would prefer the last solution. – willcode.co Jan 28 '13 at 15:04
0

You could encode your key as one bit per word, where the parity of the word length indicates the bit: a word with an even number of letters is a 0-bit and a word with an odd number of letters is a 1-bit. I discuss this at my blog.

user448810
  • 17,381
  • 4
  • 34
  • 59
  • So I would need quite a lot of words! well 256 I guess. Surely more information could be encoded into a word? – willcode.co Jan 28 '13 at 14:30
  • You could encode two bits per word by taking the length modulo four, or three bits per word by taking the length modulo eight. The trick with steganography is not to have a good encoding, but to hide the fact that an encoding is present. With a little bit of work, you could probably find something to put into your signature file that encodes your key. Remember to leave extra room for a parity check. – user448810 Jan 28 '13 at 14:59
0

A dictionary with one million words, could provide the input to a 19 bit encoding with only 524k words. Since your 32 bits of input / 19 = 1.68 so you'd need at least two words for the encoding. That would be primarily because could conservatively store 2^19 values, say 524,288 words.

Then I noticed you said 32 bytes, so this is 256 / 19 or 13.47... call it 14 words to encode your data. Perhaps the folks who make the deterministic wallets use 20 bits and cover it with 12 word phrase they like to use.

The greatest benefit has to be the self error-correction 12 word phrase seeds have: any spelling mistake is actually picked up by our ability to spell these words properly. That's pretty nifty.

Tomachi
  • 1,665
  • 1
  • 13
  • 15