31

In the google protocol buffers encoding overview, they introduce something called "Zig Zag Encoding", this takes signed numbers, which have a small magnitude, and creates a series of unsigned numbers which have a small magnitude.

For example

Encoded => Plain
0 => 0
1 => -1
2 => 1
3 => -2
4 => 2
5 => -3
6 => 3

And so on. The encoding function they give for this is rather clever, it's:

(n << 1) ^ (n >> 31) //for a 32 bit integer

I understand how this works, however, I cannot for the life of me figure out how to reverse this and decode it back into signed 32 bit integers

Charles
  • 50,943
  • 13
  • 104
  • 142
Martin
  • 12,469
  • 13
  • 64
  • 128

6 Answers6

34

Try this one:

(n >> 1) ^ (-(n & 1))

Edit:

I'm posting some sample code for verification:

#include <stdio.h>

int main()
{
  unsigned int n;
  int r;

  for(n = 0; n < 10; n++) {
    r = (n >> 1) ^ (-(n & 1));
    printf("%u => %d\n", n, r);
  }

  return 0;
}

I get following results:

0 => 0
1 => -1
2 => 1
3 => -2
4 => 2
5 => -3
6 => 3
7 => -4
8 => 4
9 => -5
Zombo
  • 1
  • 62
  • 391
  • 407
3lectrologos
  • 9,469
  • 4
  • 39
  • 46
  • 1
    I knew there had to be a way around the multiply. Kudos! – ergosys Feb 05 '10 at 23:18
  • Well, it works for me and for ergosys, so it should work for you too... Can you tell me what results you get? – 3lectrologos Feb 05 '10 at 23:23
  • It may well be that I'm using it incorrectly. I have a UInt32 for n, and I'm casting the returned result into Int32 afterwards. Which sounds like a logical way to do things... – Martin Feb 05 '10 at 23:24
  • The problem is that you use an unsigned int. Try using an Int32. – 3lectrologos Feb 05 '10 at 23:26
  • The point of the function is that it converts an unsigned int into a signed int. If I simply cast my input into a signed int, I will lose a whole section of the upper range of my inputs – Martin Feb 05 '10 at 23:28
  • Actually, I don't think the unsigned int is the problem. I'm posting some code that works for me. – 3lectrologos Feb 05 '10 at 23:31
  • @3lectrologos, finally got to vote you up, I couldn't until you edited because I undid it once. – ergosys Feb 05 '10 at 23:40
  • 1
    I think this may be a language problem, translating that pretty much directly into C# results in an error, negating a UInt32 results in a long, and xor of long and UInt32 is undefined. I'm going to have a go at fixing it for C# – Martin Feb 05 '10 at 23:42
  • 1
    return ((int)(u >> 1)) ^ ((int)(-(u & 1))); The power of casting has fixed it. So, the question remains which should I use, from what ergosys said above I would assume this is faster due to a lack of multiplication? – Martin Feb 05 '10 at 23:44
  • @Martin: Can't you just cast it to a Int32 before negating it? – 3lectrologos Feb 05 '10 at 23:45
  • It's much faster actually! :) – 3lectrologos Feb 05 '10 at 23:46
  • My casts are killing your method. I just moved the cast to inside the negation as you suggested a couple of comments up, This is now the fastest method by a whisker – Martin Feb 05 '10 at 23:51
  • Yes, that should be as fast as it gets. – 3lectrologos Feb 05 '10 at 23:55
  • This is how it's done in the protobuf implementation; look at the bottom of protobuf-2.3.0\src\google\protobuf\wire_format_lite.h – Trevor Robinson Nov 04 '10 at 18:10
5

Here's yet another way of doing the same, just for explanation purposes (you should obviously use 3lectrologos' one-liner).

You just have to notice that you xor with a number that is either all 1's (equivalent to bitwise not) or all 0's (equivalent to doing nothing). That's what (-(n & 1)) yields, or what is explained by google's "arithmetic shift" remark.

int zigzag_to_signed(unsigned int zigzag)
{
    int abs = (int) (zigzag >> 1);

    if (zigzag % 2)
        return ~abs;
    else
        return abs;
}

unsigned int signed_to_zigzag(int signed)
{
    unsigned int abs = (unsigned int) signed << 1;

    if (signed < 0)
        return ~abs;
    else
        return abs;
}

So in order to have lots of 0's on the most significant positions, zigzag encoding uses the LSB as sign bit, and the other bits as the absolute value (only for positive integers actually, and absolute value -1 for negative numbers due to 2's complement representation).

Cimbali
  • 11,012
  • 1
  • 39
  • 68
  • `zigZag_to_signed` is not returning the original value. – Salar Dec 10 '15 at 22:13
  • @SalarKhalilzadeh Thanks, fixed. Operator precedence was wrong, casting then shifting lost the first bit of "zigzag". – Cimbali Dec 14 '15 at 18:55
  • The difference with the answer of 3lectrologos is that you use a test. A test ~disable pipelining of operation and is slower than a computation without a test and branch. – chmike Jul 02 '18 at 09:21
  • @chmike Yes this answer is to illustrate what happens, not for performance, as acknowledged in the first line. – Cimbali Jul 02 '18 at 10:14
  • 1
    Thanks for this. This version makes it easy to see intuitively why the encoding scheme works. – 1110101001 Jan 19 '21 at 02:42
  • This helped me since I was looking for decoding of ulong -> long. I know the original question are related to Int32, however this answer seems robust for higher numbers. I ended up with `long abs = (long)(n >> 1); if (n % 2 != 0) return ~abs; else return abs;` – schwartz Apr 18 '22 at 12:47
4

How about

(n>>1) - (n&1)*n
ergosys
  • 47,835
  • 5
  • 49
  • 70
2

After fiddling with the accepted answer proposed by 3lectrologos, I couldn't get it to work when starting with unsigned longs (in C# -- compiler error). I came up with something similar instead:

( value >> 1 ) ^ ( ~( value & 1 ) + 1 )

This works great for any language that represents negative numbers in 2's compliment (e.g. .NET).

Imisnew2
  • 105
  • 1
  • 6
1

I have found a solution, unfortunately it's not the one line beauty I was hoping for:

uint signMask = u << 31;
int iSign = *((Int32*)&signMask);
iSign >>= 31;
signMask = *((UInt32*)&iSign);

UInt32 a = (u >> 1) ^ signMask;
return *((Int32*)&a);
Martin
  • 12,469
  • 13
  • 64
  • 128
-1

I'm sure there's some super-efficient bitwise operations that do this faster, but the function is straightforward. Here's a python implementation:

def decode(n):
  if (n < 0):
    return (2 * abs(n)) - 1
  else:
    return 2 * n

>>> [decode(n) for n in [0,-1,1,-2,2,-3,3,-4,4]]
[0, 1, 2, 3, 4, 5, 6, 7, 8]
aem
  • 3,896
  • 24
  • 20
  • Thanks, but unfortunately this is in a network encoding system for a game, and this particular decode function is used many times per packet, many times per second - it's gotta be _fast_ – Martin Feb 05 '10 at 23:20
  • You can use a simple bit op to speed this up a bit. Shift by 1 to multiply by 2. – William Morrison Oct 17 '13 at 14:13
  • -1: This function *encodes* signed numbers into coded unsigned numbers. The original question already has a function that does that. The original question asked for a function that *decodes* those unsigned numbers back to the original signed numbers: we want decode(3) to return -2, but this function makes decode(3) return 6. – David Cary Jun 18 '15 at 15:23