0

I want to create a shuffling function that generates a reversible output from a seed and an input string, but that will also shuffle the string such that the seed can be calculated from the input string and the output string. This will only be useful if the seed is less than the length of the input string.

This question is similar to Reversible shuffle algorithm using a key and Unshuffle an array of int/char, but is different in that we do not know the "seed" of the output string. We are not trying to reverse a random operation, but represent a string in a different way.

A = '1111100000' and B = '0101010101'

Basically shuffle(A,calc_seed(A,B)) = B

shuffle(A,seed) should return a string based off A and the seed. calc_seed(A,B) should return the seed needed to be inputed into shuffle() to return B.

Is this possible without using brute force to find the seed, and so that there exists a seed for every A and B that will allow A to be converted to B?

433MEA
  • 100
  • 8
  • I'm no expert on cryptography but I would suggest that there's no general solution for this (even brute force) when there are duplicates such as in your example. There may be several different seeds that would produce the same result – DarkKnight Mar 21 '23 at 12:41
  • Duplicates are fine, any seed that works is what I'm looking for. – 433MEA Mar 21 '23 at 12:45
  • You mean that `shuffle()` will output all N possible shuffles of the input string using seeds exactly from 0 to N-1? – Yanb Mar 21 '23 at 12:46
  • What exactly does `calc_seed` produce? You seem to be using `shuffle` in a card-shuffling sense (order-preserving merge of two sequences), not the `random.shuffle` sense (random permutation of the elements of a single sequence). – chepner Mar 21 '23 at 12:47
  • No, I want shuffle() to output B based on A and the seed. – 433MEA Mar 21 '23 at 12:47

1 Answers1

1

This is impossible in the general case, if you want to discover the seed.

In general, you have no control over the number of repeats in A and B.

There is nothing stopping the input being "aaaaaaaaaaaaaaaa", in which case the output will be the same, regardless of the seed. It will therefore be impossible to discover the seed.

However, if you only need to discover a seed, it can be done

If you are starting with string

 abcdefghij

Then when you shuffle the characters, there are 10 positions you can put the a. Once that is in place, you have 9 positions you can put the b, etc, until there is no choice for where you put the j.

So the seed could be composed of just the following sequence:

  • number from 0 to 9, saying where to put the a
  • number from 0 to 8, saying where to put the b
  • etc
  • number from 0 to 1, saying where to put the i
  • no information needed for j

Once you can see the string (either A or B), you know the length, N.

So the numbers you need to encode are:

  • 0 to N-1
  • 0 to N-2 ...
  • 0 to 1

Why not compute a value like this:

     (0 to N-1) x N
     (0 to N-2) x (N-1)
     (0 to N-3) x (N-2)
     ...
     (0 to 1) x 2
  +  ===================
        calculate total
     ===================

Broadly speaking, if the number of characters in the "alphabet" is much more than the length of A, then this seed will indeed be shorter than A. I am assuming there is no way to compress A, because the characters are random. My reasoning is that each character of A, which has (say) 6 bits, will need fewer bits: on average just enough bits to store N/2.

ProfDFrancis
  • 8,816
  • 1
  • 17
  • 26