1

In this question a Shuffle Bits algorithm was proposed according to the works of D. Knuth, in order to obfuscate id's. The code is:

// *** Shuffle bits (method used here is described in D.Knuth's vol.4a chapter 7.1.3)
const unsigned mask1 = 0x00550055; const unsigned d1 = 7;
const unsigned mask2 = 0x0000cccc; const unsigned d2 = 14;

// Obfuscate
unsigned t = (x ^ (x >> d1)) & mask1;
unsigned u = x ^ t ^ (t << d1);
t = (u ^ (u  >> d2)) & mask2;
y = u ^ t ^ (t << d2);

// Restore
t = (y ^ (y >> d2)) & mask2;
u = y ^ t ^ (t << d2);
t = (u ^ (u >> d1)) & mask1;
z = u ^ t ^ (t << d1);

The algorithm has been tested and it works perfectly if run as is, without changes.

I want to modify the way it shuffles bits, and I have been making changes to mask1, mask2, d1 and d2, but if I alter them the algorithm stops working and the inverse (restore) fails. I have been trying with XORed masks, prime numbers, inverse and correlate masks, but it always fails.

I have been trying to determine the insights of the algorithm by reading the chapter in the Art of Computer Programming vol.4a, also the Vol. 2, Chapter 4.3.2, but they are pretty dense and talk in general, they don't focus on variants of this example nor explain other options in detail for this algorithm in particular.

The question is:

Why were these values chosen and not others? (mask1, mask2, d1, d2).

What is the rule to choose values of mask1, mask2, d1, d2 to alter the shuffle while keeping the algorithm working? (By working means that x=z at the end of the obfuscation/restoration process always, for any natural values).

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62

1 Answers1

1

The obfuscate code is actually obfuscate1; obfuscate2;, and the restore code is actually restore2; restore1;. It's possible to nest these steps arbitrarily.

Let's look at just the first step.

// Obfuscate
unsigned t = (x ^ (x >> d1)) & mask1;
unsigned u = x ^ t ^ (t << d1);
// Restore
t = (u ^ (u >> d1)) & mask1;
z = u ^ t ^ (t << d1);

It's actually the same step, first applied to x, then applied to u. The key is that t is the same both times, since the identity a ^ b ^ b == a together with the associativity and commutativity of ^ implies the following. (I'm going to assume sane operator precedence of ==; if you translate these equations into C, you're going to need more parentheses.)

z == u ^ t ^ (t << d1)
  == x ^ t ^ (t << d1) ^ t ^ (t << d1)
  == x ^ t ^ t ^ (t << d1) ^ (t << d1)
  == x ^ (t << d1) ^ (t << d1)
  == x

I claim that it's sufficient if (mask1 << d1) >> d1 == mask1 (i.e., the high d1 bits of mask1 are zero) and (mask1 & (mask1 << d1)) == 0. Here's some algebra where I push the ^ operations to the root of the expression tree for u at the cost of blowing up the expressions.

t == (x ^ (x >> d1)) & mask1
  == (x & mask1) ^ ((x >> d1) & mask1)
t << d1 == ((x & mask1) ^ ((x >> d1) & mask1)) << d1
        == ((x & mask1) << d1) ^ (((x >> d1) & mask1) << d1)
u == x ^
     (x & mask1) ^
     ((x >> d1) & mask1) ^
     ((x & mask1) << d1) ^
     (((x >> d1) & mask1) << d1)

Let's do the same (but sparser intermediate steps) for the second expression for t, which I'll call t' because it hasn't been proved equal to t yet.

t' == (u ^ (u >> d1)) & mask1
   == (x & mask1) ^                                  // 0.
      ((x & mask1) & mask1) ^                        // 1.
      (((x >> d1) & mask1) & mask1) ^                // 2.
      (((x & mask1) << d1) & mask1) ^                // 3.
      ((((x >> d1) & mask1) << d1) & mask1) ^        // 4.
      (((x >> d1) & mask1) & mask1) ^                // 5.
      (((x & mask1) >> d1) & mask1) ^                // 6.
      ((((x >> d1) & mask1) >> d1) & mask1) ^        // 7.
      ((((x & mask1) << d1) >> d1) & mask1) ^        // 8.
      (((((x >> d1) & mask1) << d1) >> d1) & mask1)  // 9.

Masking a second time has no effect.

t' == (x & mask1) ^                                  // 0.
      (x & mask1) ^                                  // 1a.
      ((x >> d1) & mask1) ^                          // 2a.
      (((x & mask1) << d1) & mask1) ^                // 3.
      ((((x >> d1) & mask1) << d1) & mask1) ^        // 4.
      ((x >> d1) & mask1) ^                          // 5a.
      (((x & mask1) >> d1) & mask1) ^                // 6.
      ((((x >> d1) & mask1) >> d1) & mask1) ^        // 7.
      ((((x & mask1) << d1) >> d1) & mask1) ^        // 8.
      (((((x >> d1) & mask1) << d1) >> d1) & mask1)  // 9.

Eliminate 0 and 1a, and 2a and 5a, which cancel each other.

t' == (((x & mask1) << d1) & mask1) ^                // 3.
      ((((x >> d1) & mask1) << d1) & mask1) ^        // 4.
      (((x & mask1) >> d1) & mask1) ^                // 6.
      ((((x >> d1) & mask1) >> d1) & mask1) ^        // 7.
      ((((x & mask1) << d1) >> d1) & mask1) ^        // 8.
      (((((x >> d1) & mask1) << d1) >> d1) & mask1)  // 9.

Since (mask1 << d1) >> d1 == mask1, shifting left and right after masking is redundant.

t' == (((x & mask1) << d1) & mask1) ^          // 3.
      ((((x >> d1) & mask1) << d1) & mask1) ^  // 4.
      (((x & mask1) >> d1) & mask1) ^          // 6.
      ((((x >> d1) & mask1) >> d1) & mask1) ^  // 7.
      (x & mask1) ^                            // 8b.
      ((x >> d1) & mask1)                      // 9b.

The identity (mask1 & (mask1 << d1)) == 0 can be shifted right to show that (mask1 & (mask1 >> d1)) == 0. This implies (tediously) that 6, 7, 8b, and 9b all are zero, leaving the following.

t' == (x & mask1) ^                            // 8b.
      ((x >> d1) & mask1)                      // 9b.
   == t

tl;dr

(mask1 << d1) >> d1 == mask1 (i.e., the high d1 bits of mask1 are zero) and (mask1 & (mask1 << d1)) == 0. But seriously, just AES encrypt/decrypt instead.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • 1
    AES result contains ASCII characters and potentially longer results. We are focussed on plain digit shifting, with mere digit results, it does not need to be a strong encryption method, but at least simple to implement and keeping digits, not strings. Your answer is pretty valid, amazing and pretty accurate, will be accepted as an answer. I think the matter of Id obfuscation using only numbers has been discussed before. But if you believe you have other similar solutions aside of AES and involving numbers, happy to post another question ;) – null_pointer Aug 27 '15 at 16:13