3

In order to save space for a very particular use-case in Oracle SQL, I'm experimenting with using a bitmap methodology for representing Boolean state over time (an event did or did not happen on that day), with each bit in a binary number representing yes/no for that day. That way, for example, a 32 bit number would allow me to represent 32 consecutive days of whether something happened each day. I would simply need to count the number of bits that are set (=1) to get the count of days in that 32-day period that the event occurred, without having store a separate dated row for each day.

Here's an example of me testing updating the value each day (rolling off the oldest bit and setting the newest one):

SELECT testbitmap original_bitmap,
       BITAND((testbitmap * POWER(2,/*rolloff the nbr of days since last event*/1)) + /*the event happened today*/1,POWER(2,32)-1) new_bitmap
  FROM (SELECT BIN_TO_NUM(1,1,0,0,1,1,0,0,0,1,1,0,1,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0) testbitmap
          FROM dual)

So far so good. But now I need to query the result, and that means counting the set bits of my resulting bitmap. There isn't an inverse of BIN_TO_NUM that I know of in Oracle. Without using a function that loops through each bit and tests it individually, is there a way to count set bits in Oracle (e.g. the number 9 should result in 2 (1001), whereas the number 7 would result in 3 (0111)? Maybe a mathematical formula the returns the number of 1s required to represent a number in binary?

Paul W
  • 5,507
  • 2
  • 2
  • 13
  • Your request is quite uncommon. I think you have to loop over the bits (and you would also need to do it in almost any other programming language). Be aware, many function are limited to 2**32. If you have more days, then you may get wrong results. – Wernfried Domscheit Sep 01 '23 at 21:12
  • How much space do you plan to save? Unless your table has billions of rows, the difference should be negligible. – Wernfried Domscheit Sep 01 '23 at 21:13
  • @Wernfried Domscheit, I'm thinking over a year, so maybe 48 bytes (max 384 days). Versus an IOT child table with a 7-byte date + avg 8 bytes for a 11-digit numeric surrogate key + 1 byte row overhead * 365 = 5,840 bytes. A regular table would require much more with the necessary index. Whatever my data volume is, the bitmap method uses less than 1/100th of the amount of space (48 bytes vs 5840 bytes). And yes, we are talking billions of rows. I have 16 billion entities that will get tracked this way. – Paul W Sep 01 '23 at 21:30
  • There are some decent algorithms that are better than just looping over the bits, but nothing out-of-the-box in Oracle. You'd have to implement one of them yourself. I suggest you give back a little of that space savings are precompute and save the number of set bits whenever you insert or update each bitmap. – Matthew McPeak Sep 01 '23 at 22:21
  • 1
    Some of those algorithms are explored [here](https://stackoverflow.com/q/109023/266304). You could also consider a Java stored procedure that uses `bitCount`. – Alex Poole Sep 01 '23 at 22:33

2 Answers2

4

You can use the Hamming Weight algorithm, in C++ it would be:

int numberOfSetBits(uint32_t i)
{
     // Java: use int, and use >>> instead of >>. Or use Integer.bitCount()
     // C or C++: use uint32_t
     i = i - ((i >> 1) & 0x55555555);        // add pairs of bits
     i = (i & 0x33333333) + ((i >> 2) & 0x33333333);  // quads
     i = (i + (i >> 4)) & 0x0F0F0F0F;        // groups of 8
     return (i * 0x01010101) >> 24;          // horizontal sum of bytes
}

In Oracle, you can use the function:

CREATE FUNCTION number_of_set_bits(
  i NUMBER
) RETURN NUMBER DETERMINISTIC
IS
  v NUMBER := i;
  c_max CONSTANT NUMBER := POWER(2, 32);
BEGIN
  v := v - BITAND(TRUNC(v/2), TO_NUMBER('55555555', 'XXXXXXXX'));
  v := MOD(
         BITAND(v, TO_NUMBER('33333333', 'XXXXXXXX'))
         + BITAND(TRUNC(v/4), TO_NUMBER('33333333', 'XXXXXXXX')),
         c_max
       );
  v := BITAND(v + FLOOR(v/16), TO_NUMBER('0F0F0F0F', 'XXXXXXXX'));
  RETURN TRUNC(MOD(v * TO_NUMBER('01010101', 'XXXXXXXX'), c_max) / POWER(2, 24));
END;
/

Then to count your value would be:

SELECT testbitmap AS original_bitmap,
       NUMBER_OF_SET_BITS(testbitmap)
FROM   (
  SELECT BIN_TO_NUM(1,1,0,0,1,1,0,0,0,1,1,0,1,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0)
           AS testbitmap
  FROM   DUAL
)

Which outputs:

ORIGINAL_BITMAP NUMBER_OF_SET_BITS(TESTBITMAP)
3429605376 11

You should also be able to create an Oracle function using Java:

CREATE OR REPLACE FUNCTION number_of_set_bits(i NUMBER) RETURN NUMBER DETERMINISTIC
  AS LANGUAGE JAVA NAME 'java.lang.Long.bitCount( long ) return long';
/

Which, for the same SELECT query, gives the same output.

fiddle

MT0
  • 143,790
  • 11
  • 59
  • 117
0

I am not as smart as @MT0, I would use this approach. First convert the number into binary:

CREATE FUNCTION Dec2Base(DecN IN NUMBER, Base IN INTEGER DEFAULT 16) RETURN VARCHAR2 DETERMINISTIC IS
    
    n NUMBER;
    BaseString VARCHAR2(129) := NULL;
    DecNumber NUMBER := DecN;
    HexString CONSTANT CHAR(16) := '0123456789ABCDEF';
    
BEGIN
    IF DecN IS NULL THEN
        RETURN NULL;
    ELSIF Base > 16 THEN 
        RAISE VALUE_ERROR;
    ELSIF base = 16 THEN
        RETURN TO_CHAR(DecN, 'fmXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
    ELSE
        IF DecN > 2**127 THEN
            -- "MOD(DecNumber, Base)" returns wrong result for larger numbers
            RAISE NUMERIC_OVERFLOW;
        END IF;         
        LOOP
            BaseString := SUBSTR(HexString, MOD(DecNumber, Base) + 1, 1 ) || BaseString;
            DecNumber := TRUNC(DecNumber / Base);
            EXIT WHEN DecNumber = 0;
        END LOOP;
        RETURN BaseString;
    END IF;
    
END Dec2Base;

Then you can count the 1 with REGEXP_COUNT

REGEXP_COUNT(Dec2Base(BIN_TO_NUM(1,1,0,0,1,1,0,0,0,1,1,0,1,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0), 2), '1')

I assume the Hamming Weight algorithm will be more efficient. You may consider to store the value as RAW rather than as an integer value. The UTL_RAW package provides various functions which should support your needs.

Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110