70

I am facing a rather peculiar problem. I am working on a compiler for an architecture that doesn't support bitwise operations. However, it handles signed 16-bit integer arithmetics and I was wondering if it would be possible to implement bitwise operations using only:

  • Addition (c = a + b)
  • Subtraction (c = a - b)
  • Division (c = a / b)
  • Multiplication (c = a * b)
  • Modulus (c = a % b)
  • Minimum (c = min(a, b))
  • Maximum (c = max(a, b))
  • Comparisons (c = (a < b), c = (a == b), c = (a <= b), et.c.)
  • Jumps (goto, for, et.c.)

The bitwise operations I want to be able to support are:

  • Or (c = a | b)
  • And (c = a & b)
  • Xor (c = a ^ b)
  • Left Shift (c = a << b)
  • Right Shift (c = a >> b)
  • (All integers are signed so this is a problem)
  • Signed Shift (c = a >>> b)
  • One's Complement (a = ~b)
  • (Already found a solution, see below)

Normally the problem is the other way around; how to achieve arithmetic optimizations using bitwise hacks. However not in this case.

Writable memory is very scarce on this architecture, hence the need for bitwise operations. The bitwise functions themselves should not use a lot of temporary variables. However, constant read-only data & instruction memory is abundant. A side note here also is that jumps and branches are not expensive and all data is readily cached. Jumps cost half the cycles as arithmetic (including load/store) instructions do. On other words, all of the above supported functions cost twice the cycles of a single jump.


Some thoughts that might help:

I figured out that you can do one's complement (negate bits) with the following code:

// Bitwise one's complement
b = ~a;
// Arithmetic one's complement
b = -1 - a;

I also remember the old shift hack when dividing with a power of two so the bitwise shift can be expressed as:

// Bitwise left shift
b = a << 4;
// Arithmetic left shift
b = a * 16; // 2^4 = 16

// Signed right shift
b = a >>> 4;
// Arithmetic right shift
b = a / 16;

For the rest of the bitwise operations I am slightly clueless. I wish the architects of this architecture would have supplied bit-operations.

I would also like to know if there is a fast/easy way of computing the power of two (for shift operations) without using a memory data table. A naive solution would be to jump into a field of multiplications:

b = 1;
switch (a)
{
  case 15: b = b * 2;
  case 14: b = b * 2;
  // ... exploting fallthrough (instruction memory is magnitudes larger)
  case 2: b = b * 2;
  case 1: b = b * 2;
}

Or a Set & Jump approach:

switch (a)
{
  case 15: b = 32768; break;
  case 14: b = 16384; break;
  // ... exploiting the fact that a jump is faster than one additional mul
  //     at the cost of doubling the instruction memory footprint.
  case 2: b = 4; break;
  case 1: b = 2; break;
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
Statement
  • 3,888
  • 3
  • 36
  • 45
  • 13
    Just out of curiosity, how on earth can a CPU get built these days without boolean operators? Is this a decimal machine? – Mike Dunlavey Jun 06 '10 at 02:31
  • 8
    This is by far the most genuinely interesting question I've seen on Stack Overflow recently. – bcat Jun 06 '10 at 02:41
  • I too would like to know what bizarre architecture he has to use that doesn't have instructions for the most basic bit operators. – erjiang Jun 06 '10 at 03:09
  • 3
    If the relations on operation costs are accurate this must be a very odd machine indeed - integer divide the same speed as multiplication? My guess would be something built from discrete logic, maybe like NASA's custom build computers they used in early space probes? – Durandal Jun 06 '10 at 03:41
  • I expected that alien tech was far more advanced than ours. Anyway, kudos for the question. – jweyrich Jun 06 '10 at 04:02
  • 7
    To still your curiosity and perhaps also letting your expectations down, this isn't NASA space probe stuff. (I'd have to kill you if I said it was). Actually, this architecture is from a game called RoboCom. The game has a fun, simple idea; you write assembly for a robot which then tries to overtake other robots. Memory is *very* scarce (about 40 bytes) per robot, and I wanted to write a high level compiler that also could supply a slightly expensive bitpacker to squeeze more info in there. Constant memory & tables can be simulated through data banks containing SET operands. A game for coders! – Statement Jun 06 '10 at 08:55
  • I haven't forget about this question but I am a bit busy at the moment. I will come back and review all answers in depth later next week. Thanks a lot for all your assistance! It really made me spawn some ideas on how to implement this on an abstract level. – Statement Jun 06 '10 at 16:50
  • 4
    If it's any comfort, the IBM 1620 not only had no built-in binary operators, it couldn't even ADD. Addition had to be done by table lookup. On the other hand, since it was a decimal machine, it could handle arbitrary precision decimal numbers (useful in business). – Mike Dunlavey Jun 07 '10 at 13:06
  • 1
    languages such as BASIC and FORTRAN may not have bitwise operators (depending on the variant). It's common in those languages to do things, like encoding 'options' to a function in decimal digits of an integer, which may seem bizarre to C programmers. And things like `a*sel + b*(1-sel)` instead of `sel ? a : b`. – greggo Feb 04 '15 at 22:13
  • Well, I arrived here due to a requirement I have to do these operations in SAP's ABAP language. One of the truly dismally designed languages out there. Coming from a C background I took quite a few things for granted. So this question's answer is something I'm quite interested in. – Marius Apr 27 '17 at 12:04
  • A few toy ISAs (for people learning assembly language) suck this much (no bitwise booleans or right shift): for example [tag:MARIE] ([ISA](https://github.com/MARIE-js/MARIE.js/wiki/MARIE-Instruction-Set-(with-Opcodes))) and [Little Man Computer](https://en.wikipedia.org/wiki/Little_man_computer). Programming for them is barely like assembly language at all, if you need to do anything that would be easy with binary numbers. Other toy ISAs can at least do binary stuff: LC-3 has ADD, AND, NOT, and indirect memory addressing. y86 is stripped-down x86, IIRC removing right shift but not booleans. – Peter Cordes Dec 04 '19 at 09:23
  • See also: https://en.wikipedia.org/wiki/Bitwise_operation#Mathematical_equivalents – 0 _ Apr 07 '21 at 20:13

7 Answers7

31

First solutions for shifting (shift is the shift distance, must not be negative, a is the operand to be shifted and contains also the result when done). The power table is used by all three shift operations.

// table used for shift operations
powtab = { 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, -32768 };

// logical shift left
if (shift > 15) {
     a = 0; // if shifting more than 15 bits to the left, value is always zero
} else {
     a *= powtab[shift];
}

// logical shift right (unsigned)
if (shift > 15) {
    a = 0; // more than 15, becomes zero
} else if (shift > 0) {
    if (a < 0) {
        // deal with the sign bit (15)
        a += -32768;
        a /= powtab[shift];
        a += powtab[15 - shift];
    } else {
        a /= powtab[shift];
    }
}

// arithmetic shift right (signed)
if (shift >= 15) {
    if (a < 0) {
        a = -1;
    } else {
        a = 0;
    }
} else if (shift > 0) {
    if (a < 0) {
        // deal with the sign bit
        a += -32768;
        a /= powtab[shift];
        a -= powtab[15 - shift];
    } else {
        // same as unsigned shift
        a /= powtab[shift];
    }
}

For AND, OR and XOR i could not come up with a simple solution, so i'll do it with looping over each single bit. There might be a better trick to do this. Pseudocode assumes a and b are input operands, c is the result value, x is the loop counter (each loop must run exactly 16 times):

// XOR (^)
c = 0;
for (x = 0; x <= 15; ++x) {
    c += c;
    if (a < 0) {
        if (b >= 0) {
            c += 1;
        }
    } else if (b < 0) {
        c += 1;
    }
    a += a;
    b += b;
}

// AND (&)
c = 0;
for (x = 0; x <= 15; ++x) {
    c += c;
    if (a < 0) {
        if (b < 0) {
            c += 1;
        }
    }
    a += a;
    b += b;
}

// OR (|)
c = 0;
for (x = 0; x <= 15; ++x) {
    c += c;
    if (a < 0) {
        c += 1;
    } else if (b < 0) {
        c += 1;
    }
    a += a;
    b += b;
}

Thats assuming that all variables are 16 bits and all operations behave as signed (so a<0 actually is true when bit 15 is set).

EDIT: i actually tested all possible operand values (-32768 to 32767) for shifts ranging from 0 to 31 for correctness and it works correctly (assuming integer divides). For the AND/OR/XOR code an exhaustive test takes too long on my machine, but since the code for these is pretty simple there should be no edge cases anyway.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Durandal
  • 19,919
  • 4
  • 36
  • 70
6

An incomplete answer on an old question, here concentrating on AND, OR, XOR. Once a solution is found for one of these bitwise operations, the other two can be derived. There are several ways, one is shown in the following test program (compiled on gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)).

In December 2018 I discovered an error in the solution. The XOR commented below only works because intermediate results in a+b-2*AND(a,b) are promoted to int, which is larger than 16 bits for all modern compilers.

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

//#define XOR(a,b) (a + b - 2*AND(a,b)) // Error. Intermediate overflow
#define XOR(a,b) (a - AND(a,b) +  b - AND(a,b) )
#define IOR(a,b) XOR(XOR(a,b),AND(a,b)) // Credit to Jan Gray, Gray Research LLC, for IOR
static const uint16_t andlookup[256] = {
#define C4(a,b) ((a)&(b)), ((a)&(b+1)), ((a)&(b+2)), ((a)&(b+3))
#define L(a) C4(a,0), C4(a,4), C4(a,8), C4(a,12)
#define L4(a) L(a), L(a+1), L(a+2), L(a+3)
    L4(0), L4(4), L4(8), L4(12)
#undef C4
#undef L
#undef L4
};

uint16_t AND(uint16_t a, uint16_t b) {
    uint16_t r=0, i;

    for ( i = 0; i < 16; i += 4 ) {
            r = r/16 + andlookup[(a%16)*16+(b%16)]*4096;
            a /= 16;
            b /= 16;
    }
    return r;
}

int main( void ) {
    uint16_t a = 0, b = 0;

    do {
            do {
                    if ( AND(a,b) != (a&b) ) return printf( "AND error\n" );
                    if ( IOR(a,b) != (a|b) ) return printf( "IOR error\n" );
                    if ( XOR(a,b) != (a^b) ) return printf( "XOR error\n" );
            } while ( ++b != 0 );
            if ( (a & 0xff) == 0 )
                    fprintf( stderr, "." );
    } while ( ++a != 0 );
    return 0;
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
Baard
  • 809
  • 10
  • 26
  • You know how is calculated this lookup table ? – Alexandre Apr 23 '15 at 17:31
  • 1
    @Alexandre, there are several possibilities. I originally used a small auxiliary program, but I have now changed the answer to use macros. – Baard Apr 25 '15 at 11:31
  • the response made by our fellow Durandal, he implemented the shift operators, you know another way to implement these operators? – Alexandre Apr 29 '15 at 12:35
6

In this environment it might be best if you could set up to actually use arithmatic operators to peel out components of integers.

E.G.

if (a & 16)  becomes if ((a % 32) > 15)
a &= 16 becomes if ((a % 32) < 15) a += 16

The transforms for these operators are obvious enough if you restrict RHS to a constant power of 2.

Peeling off two or four bits is also easy to do.

Joshua
  • 40,822
  • 8
  • 72
  • 132
3

You can operate bit-by-bit (as Mark Byers suggested), by extracting every bit which will be slow.

Or you could accelerate process and use 2d lookup tables that store results, say, for two 4-bit operands and operate on those. You'll need less extractions than if you were operating on bits.

You can also do everything using addition, subtraction and >= operation. Every bitwise operation can be unrolled into something like this using macros:

/*I didn't actually compile/test it, it is just illustration for the idea*/
uint16 and(uint16 a, uint16 b){
    uint16 result = 0;
    #define AND_MACRO(c) \
        if (a >= c){ \ 
            if (b >= c){\
                result += c;\
                b -= c;\
            }\
            a -= c;\
        }\
        else if (b >= c)\
            b -= c;

    AND_MACRO(0x8000)
    AND_MACRO(0x4000)
    AND_MACRO(0x2000)
    AND_MACRO(0x1000)
    AND_MACRO(0x0800)
    AND_MACRO(0x0400)
    AND_MACRO(0x0200)
    AND_MACRO(0x0100)
    AND_MACRO(0x0080)
    AND_MACRO(0x0040)
    AND_MACRO(0x0020)
    AND_MACRO(0x0010)
    AND_MACRO(0x0008)
    AND_MACRO(0x0004)
    AND_MACRO(0x0002)
    AND_MACRO(0x0001)
    #undef AND_MACRO
    return result;
}

You'll need 3 variables to implement this.

Every bitwise operation will revolve around macros similar to AND_MACRO - you compare remaining values of a and b to the "mask" (which is "c" parameter). then add mask to the result in the if branch that is suitable for your operation. And you subtract mask from values, if bit is set.

Depending on your platform, it may be faster than extracting every bit using % and / , and then putting it back using multiplication.

See for yourself whichever is better for you.

phuclv
  • 37,963
  • 15
  • 156
  • 475
SigTerm
  • 26,089
  • 6
  • 66
  • 115
2

As long as you're willing for it to be very expensive, yes.

Basically, you'll explicitly put a number into a base-2 representation. You do this just as you would put a number into base-10 (e.g., to print it out), that is, by repeated division.

This turns your number into an array of bools (or ints in the range 0,1), then we add functions to operate on those arrays.

again, not that this is tremendously more expensive than bitwise operations, and that almost any architecture will supply bitwise operators.

In C (of course, in C you have bitwise operators, but...) an implementation might be:

include <limits.h>
const int BITWIDTH = CHAR_BIT;
typedef int[BITWIDTH] bitpattern;

// fill bitpattern with base-2 representation of n
// we used an lsb-first (little-endian) representation
void base2(char n, bitpattern array) {
  for( int i = 0 ; i < BITWIDTH ; ++i ) {
    array[i] = n % 2 ;
    n /= 2 ;
  }
}

void bitand( bitpattern op1, bitpattern op2, bitpattern result ) {
  for( int i = 0 ; i < BITWIDTH ; ++i ) {
    result[i] = op1[i] * op2[i];
  }
}


void bitor( bitpattern op1, bitpattern op2, bitpattern result ) {
  for( int i = 0 ; i < BITWIDTH ; ++i ) {
    result[i] = (op1[i] + op2[i] != 0 );
  }
}

// assumes compiler-supplied bool to int conversion 
void bitxor( bitpattern op1, bitpattern op2, bitpattern result ) {
  for( int i = 0 ; i < BITWIDTH ; ++i ) {
    result[i] = op1[i] != op2[i]  ;
  }
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
tpdi
  • 34,554
  • 11
  • 80
  • 120
2

Just some other approaches

For example a 16-bit AND:

int and(int a, int b) {
    int d=0x8000;
    int result=0;
    while (d>0)  {
        if (a>=d && b>=d) result+=d;
        if (a>=d) a-=d;
        if (b>=d) b-=d;
        d/=2;
    }
    return result;
}

double solution 2-bit AND without loops or table lookups:

int and(int a, int b) {
    double x=a*b/12;
    return (int) (4*(sign(ceil(tan(50*x)))/6+x));
}

32-bit integer solution 2-bit AND:

int and(int a, int b) {
    return ((684720128*a*a -b) * a) % (b+1);
}

16-bit integer solution 2-bit AND:

int and(int a, int b) {
    return ((121 * a) % 16) % (b+1);
}

16-bit integer solution 3-bit AND:

int and(int a, int b) {
    return sign(a) * ((((-23-a) * (40+b)) % 2)+40+b) % ((10624 * ((((-23-a) * (40+b))%2)+40+b)) % (a%2 - 2 -a) - a%2 + 2 +a);
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
Bob Genom
  • 181
  • 3
0

here's a method i came up with to process bitwise XOR 16-bits in parallel using Double-64 integer adds :

[gmn]awk '{ CONVFMT = OFMT = "%.20g" 

     c = (a=3e15+("1011000111110101"))+
         (b=3e15+("1101010010101110"))             
           
         sub(/[7]/,   "1",c)
        gsub(/[268]/ ,"0",c)
         sub(/^[^01]+/,"",c); print c }'

The bit-strings look like these (i took out the 3e15 guard digit here for clarity) :

 a =    1011 0001 1111 0101
 b =    1101 0100 1010 1110
 c =    8112 0101 2121 1211 (intermediate)
-------------------------------------------
 c =    0110 0101 0101 1011 (output)

one 52-bit unsigned integer add, and barely a handful of string substitution calls, and the output is already in a state that can be passed downstream.

The absolute highest value this add will climb to is 8222,2222,222,222, just shy of the 53-bit hard-limit.

For a bit-wise AND, convert all the 1's, leading 6 or 7, down to 0s : only 2's and the leading 8 are true bits that should then be converted to 1s.

For bit-wise OR, it's the reverse - anything not a 0 or 6 is a "1" in the output string.

For bit-wise complement, even easier - start with 1,111,111,111,111,111, and substract the concatenated bit strings of 2 bytes to obtain it.

RARE Kpop Manifesto
  • 2,453
  • 3
  • 11