2

I want to generate a random string with any fixed length (N) of my choice. With the same number as a feed to this algorithm it should generate the same string. And with small change to the number like number+1, it should generate a completely different string. (Difficult to relate to the previous seed) It's ok if more than one number might result in the same string. Any approaches for doing this?

By the way, I have a set of characters that I want to appear in the string, like A-Z a-z 0-9.

For example

Algorithm(54893450,4,"ABCDEFG0") -> A0GF
Algorithm(54893451,4,"ABCDEFG0") -> BDCG

I could random each characters one by one, but it would need N different seed for each characters. If I want to do it this way, the question might become "how to generate N numbers from one number" for the seeds.

The end goal is that I want to convert a GUID to something more readable on printed media and shorter. I don't care about conflict. (If the conflict did happen, I can still check the GUID for resolution)

gsamaras
  • 71,951
  • 46
  • 188
  • 305
5argon
  • 3,683
  • 3
  • 31
  • 57
  • 1
    You want to use a modular multiplicative inverse. See my answer here: https://stackoverflow.com/a/34420445/56778. Also, see my blog post about this for more detail. http://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/. The basic idea is that you get the next number, obfuscate it (basically, do a transform so that it doesn't appear sequential), and then encode it. – Jim Mischel Jun 26 '17 at 17:52
  • *I could random each characters one by one* or you could realise that `A0GF` is just a 4-digit base-62 number. – High Performance Mark Jun 26 '17 at 18:40
  • Yeah I could convert my random number to whatever-base number with base equal to my number of characters. But nearby value would provide similar looking result so I want to avoid that.. – 5argon Jun 26 '17 at 19:40
  • 1
    What's wrong with using the initial number to seed a PRNG and just iterating that PRNG `n` times to get `n` characters? PRNGs are designed (amongst other things) for producing very different and unpredictable sequences from similar seeds. – biziclop Jun 26 '17 at 20:00
  • @biziclop: Because of the [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem). After selecting about 70,000 random 32-bit numbers, your chance of generating a duplicate approaches 50%. You have to create logic that prevents duplicates, and it takes increasingly more time to do so. This is okay if you just want a few million numbers, but beyond that you really need something more reliable than random selection. – Jim Mischel Jun 26 '17 at 20:33
  • But OP said the occasional duplicate isn't a problem. (In fact duplicates are unavoidable if the input we're hashing is bigger than the hash.) Plus the input isn't just a randomly selected 32 bit number, it's a GUID which again was designed to avoid collisions. I have a feeling I may be misunderstanding the problem here. – biziclop Jun 26 '17 at 20:42
  • @biziclop: I'm thinking I might be missing it, too. I didn't fully understand until I re-read that he's not using a unique number. – Jim Mischel Jun 26 '17 at 20:52
  • 1
    @5argon: You should know that generated GUIDs aren't guaranteed to be random. Just unique. And when you chop off parts of them, you're guaranteed to have duplicates. You probably should just generate a standard hash code from the GUID using something like [Jenkins](https://en.wikipedia.org/wiki/Jenkins_hash_function) or similar. – Jim Mischel Jun 26 '17 at 20:55
  • I currently trying to use hash code from GUID -> multiply modulo -> base conversion to generate the string. I have a problem though, since the number I got (from whatever procedure) is usually large, when I convert it to some base-n representation I could see a pattern of the first character. For example this is 4 consecutive random of my base-32 value (0~9 A~...) 5S3CVL9NKK, 3Q43JJQCLR, 82394AGUM3, 3RM4MS253K. As you see the first character never reach the alphabet, understandably. This might leads to some observation that the value is not THAT random. Do you have any solutions for this? – 5argon Jun 26 '17 at 21:35
  • Oh, I guess that the multiplying number should be sufficiently large that it goes over the modulo several times to evenly distribute the result of the modulo. Currently I use something like (number * 55579) % ulong.MaxValue, 55579 might be not enough and also ulong.MaxValue is too large. I might just cut off the first character since that represents the wrap around effect of number system, I think. – 5argon Jun 26 '17 at 21:45

1 Answers1

1

Ok, thanks for the guidance @Jim Mischel. I read all the related pages and come to understand more about this.

http://blog.mischel.com/2017/05/30/how-not-to-generate-unique-codes/

http://blog.mischel.com/2017/06/02/a-broken-unique-key-generator/

http://blog.mischel.com/2017/06/10/how-did-this-happen/

http://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/

https://ericlippert.com/2013/11/12/math-from-scratch-part-thirteen-multiplicative-inverses/

https://ericlippert.com/2013/11/14/a-practical-use-of-multiplicative-inverses/

https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm

In short, first I should use a sequential number. That is 1,2,3,4,... Very predictable, but it can turn into something random and hard to guess.

(Note that in my case this is not entirely possible, since each users will be generating his own ID locally so I cannot run a global sequential number, hence I use GUID. But I will make my own workaround to fit GUID to this solution, probably with a simple modulo on the GUID to fit it to my desired range.)

With sequential integer n I can get another seemingly unrelated integer with a multiplication then a modulo. This might looks like (n * x)% m with x and m of my choice. Of course m would have to be larger than the largest number that I want to use since it wraps around the modulo while multiplying.

This alone is a good start as close number n does not provide similar output. But we cannot be so sure about that. For example, if my x is 4 and m is 16 then the input can only produce 0,4,8,12. To avoid this we choose x and m which is a coprime of each other. (Having greatest common divisor of 1) There are many obvious candidate to this such as 100000 as m (defines the limit of my output as 99999) and 2429 as x. If we choose 2 coprime like this, not only the result conflict as less as possible, it also guarantee that each input produces unique output in that range.

We can learn from this example :

(n * 5) % 16

As 5 and 16 is a coprime, we can get a maximum length of sequence of unique numbers before it wraps around. (length = 16) if we input numbers sequentially from 0 to 16 :

Input : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
Output : 0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11, 0

We can see that the output is in a not so predictable sequence and also none of the output other than the last one is the same. It travels to all available number possible.

Now my very predictable sequential running number would produce sufficiently different number and also guarantee not to conflict to any other input as long as it is in the range of m. What's left is to convert this number to a string of my choice via base conversion. If I have 5 characters "ABCDE" then I will use base-5.

Only this is enough for my use case. But with the concept of multiplicative inverse I can also find one more integer y which can reverse that multiply modulo transformation to the original number. Currently I still haven't understand that part fully, but it uses Extended Euclidean Algorithm to find y.

Since my application does not need reverting yet I am fine with not understanding it for now. I will definitely try to understand that part.

5argon
  • 3,683
  • 3
  • 31
  • 57
  • The example in my [blog post](http://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/) shows that multiplying the obfuscated number by the multiplicative inverse, then taking the modulo `m`, gives you the original number. Eric Lippert's post about the multiplicative inverse explains the math better than I can. – Jim Mischel Jun 26 '17 at 20:29