0

I was wondering if, given a known PRNG and a set of values, one could calculate that PRNG's base seed. For example, say I have an array of data

var a = [255, 192, 0, 3, 72]

I know for a fact that each of those values was generated from calls PRNG.nextInt(). So the first call to PRNG.nextInt() returned 255, the next call returned 192, and so on. If you were only given that data set, is there a PRNG algorithm to give me the original seed? So that you could do something like this:

var seed = PRNG.getSeedForData([255, 192, 0, 3, 72])
var prng = new PRNG(seed)

And then with that prng instance, this should be true:

prng.nextInt() == 255
prng.nextInt() == 192
prng.nextInt() == 0
prng.nextInt() == 3
prnt.nextInt() == 72

I looked at this post (Is it possible to reverse a pseudo random number generator?) as well as this one (Given a RNG algorithm and a series of numbers is it possible to determine what seed would produce the series?), but I'm not sure what to do with that information. It doesn't look like either post found a solution to the problem, however, and I don't want to necropost.

Based on what little I understood from those posts, though, it looks like Linear congruential generators might be the way to go?, but I'm not entirely sure how to even approach them.

Just to clarify, I trying to write an PRNG algorithm that can also give me a seed when supplied with a series of numbers. The nextInt() function in the algorithm should ideally always return a value between 0 and 255, but I'm not sure if that is relevant to the question.


UPDATE

So using the links and advice in the comments, it looks like a reverse linear congruential generator is the way to go. I wrote a function in javascript to calculate the modulus 'm' of a LCG given a bunch of data, but I've run into a problem:

function gcd(a, b)
{
    while(b != 0)
    {
       t = b; 
       b = a % b; 
       a = t;
    }

    return a;
}

var numbers = [this is any random array of 0-255 integers],
    m = 0;

for(var i = 0; i < numbers.length - 3; i++)
{
    var t0 = numbers[i + 1] - numbers[i],
        t1 = numbers[i + 2] - numbers[i + 1],
        t2 = numbers[i + 3] - numbers[i + 2],
        u0 = Math.abs((t2 * t0) - (t1 * t1));

    m = i == 0 ? u0 : gcd(m, u0);
}

console.log(m);

The issue is that this code always returns m = 1, which obviously isn't correct. If m is one, the linear congruential generator would just have mod 1, which will always return 0 for integers. Am I doing something wrong, or perhaps I misunderstood the other linked questions and answers. Is it possible to get a seed value from a PRNG given a completely random set of integers?

In terms of the code, the GCD function is based on Wikipedia's entry for a Euclid's Algorithm. The rest of it was based on the math in the question and answer @Dukeling linked. It looks right to me, but entirely possible that it isn't.

thearchitector
  • 187
  • 3
  • 12
  • Does your "set of data" start with the first value output from the PRNG ? If yes, that narrows the search space considerably. To at least 1/256th the size as a guess based on being able to throw away all seeds that don't match. But in general those questions do seem to tell you how to invert a Linear Congruential Generator. 1) write one 2) invert using those steps. – Dave S May 31 '17 at 04:16
  • What are the quality requirements of the desired PRNG? – plasmacel May 31 '17 at 04:59
  • @plasmacel If you mean in terms of cryptography, none. It doesn't need to be secure at all, it just needs to be reversible given a set of data – thearchitector May 31 '17 at 05:01
  • @TheMasterGabriel I mean in terms of statistical randomness. LCGs are generally pretty bad. – plasmacel May 31 '17 at 05:02
  • @DaveS Yes, it does. The first item in the data would be the first number returned by the prng. Also, I'm not sure how I would go about simply reversing a lgc, if I only know the data. The set of data is arbitrarily long, so the period would also be unknown, so how would I know the coefficient to Xn, the constant, and the number being moduloed by (a,b,c respectively given the equation Xn+1=(a*Xn+b) mod c. Though perhaps I'm misunderstanding? – thearchitector May 31 '17 at 05:06
  • @plasmacel Well, I'm not entirely sure. If the input data set was completely random, the prng should still be able to return a base seed, hopefully. When making the function, I have no knowledge of the input data. The sequence can be any length long. The only thing I know for sure is that each number in said arbitrarily-length set is between 0-255 – thearchitector May 31 '17 at 05:11
  • 1
    [Cracking a linear congruential generator](https://security.stackexchange.com/questions/4268/cracking-a-linear-congruential-generator) probably answers your question. The question of cracking **any** PRNG is probably too broad. – Bernhard Barker May 31 '17 at 05:16
  • A linear feedback shift register can give decent albeit nonsecure and not very high quality random numbers in a way which is trivially reversible since the outputs simply are consecutive chunks of the internal state. If you gather the output into 8-bit chunks and you have a 32-bit state, then 4 consecutive 8-bit numbers will directly tell you the original state. See this: https://en.wikipedia.org/wiki/Linear-feedback_shift_register – John Coleman May 31 '17 at 12:14
  • @JohnColeman Alright I'll take a look at that too – thearchitector May 31 '17 at 19:30
  • @Dukeling Is that guaranteed to perfectly fit all the input data? Looking at the math on that answer, in order to calculate the u at a certain index, you need the number 3 indices ahead in the data series, which won't exist for the last 3 data points. If I omit those points in calculating the modulus, can I guarantee that prng will generate those last values at n-2, n-1 and n? – thearchitector May 31 '17 at 19:35
  • @TheMasterGabriel I don't know to much about the specifics but logically it can only be guaranteed (or particularly likely) to perfectly match unknown past or future values if (1) that sequence was generated using the same technique, (2) you're given enough generated elements to remove any alternative ways of generating it (which that accepted answer points to) or (3) you get lucky. As I read that accepted answer, you need 4 elements to calculate any given u_i, but you can pick how many u_i's you use to calculate m (10 was picked in that answer). – Bernhard Barker Jun 01 '17 at 04:21
  • @Dukeling Well (1) is out of the question because I am not person generating the random sequences. The sequences are, however, thousands of numbers long, so I think that suffices (2). Any insight on the modulus 1 problem though? No matter what I try, I always calculate 'm' to be 1. – thearchitector Jun 01 '17 at 05:37
  • It sounds like you don't have the output of the PRNG itself so much as the output mod 256 or the output transformed another way to get a number in the range 0-255. You can't directly use methods for cracking a linear congruence generator given the output. The task would be even more difficult if they are generated by something like the Mersenne Twister (as would likely be the case if they were e.g. generated by a Python program). Do you know anything about the source of these numbers? Without some knowledge of the source you have a very difficult problem, one which might even be impossible. – John Coleman Jun 01 '17 at 11:22
  • @JohnColeman Yes, that was my original question. The data is completely random in the sense that I don't have control over where it comes from. All I know I that its an array of integers. I hoped it wouldn't be an impossible task, which is why I asked – thearchitector Jun 01 '17 at 11:33
  • @JohnColeman I suppose a better phrasing of the question would have been "Given any array of integers that's 'n' long, is it possible to generate a PRNG that would generate that array again after 'n' calls to 'PRNG.nextInt()' – thearchitector Jun 01 '17 at 11:52
  • The Berlekamp-Massey algorithm can synthesize a linear feedback shift register that produces any specific sequence with an internal state which is as small as possible (but if the numbers are not produced by a LFSR then the result of the algorithm might have an internal state that is nearly as large as the original list of numbers). Trivially, you can write a PRNG that has the given numbers on a stack and pops them off one by one, switching to another PRNG once that initial stack is exhausted. In the worst case, this is roughly what the Berlekamp-Massey algorithm would yield. – John Coleman Jun 01 '17 at 12:01
  • @JohnColeman Oh, very interesting. Is there any advantage of using this algorithm over polynomial regression with 'n' degrees? – thearchitector Jun 01 '17 at 20:56

0 Answers0