2

I'm looking for a fast way to calculate the hamming weight/population count/"the number of 1 bits" of a BINARY(1024) field. MySQL has a BIT_COUNT function that does something like that. I couldn't find a similar function in T-SQL?

Or would you suggest storing the binary data in a field of another type?

If you don't know what I'm talking about, here's a Wikipedia article about the hamming weight.

Joel
  • 4,732
  • 9
  • 39
  • 54
Simon
  • 1,814
  • 20
  • 37
  • http://weblogs.sqlteam.com/jeffs/archive/2007/05/09/60197.aspx – Lamak May 06 '11 at 20:06
  • This might be a job for a CLR function. Also, you've probably considered this, but if your count of unique binary values is thousands not millions, you could create a table to store the pop for each value after it's been calculated the first time. Or store it in your main table, since all you need is a `SMALLINT`. –  May 07 '11 at 14:18

5 Answers5

5

You could use a helper table with precalculated Hamming weights for small numbers, like bytes, then split the value accordingly, join to the helper table and get the sum of partial Hamming weights as the value's Hamming weight:

-- define Hamming weight helper table
DECLARE @hwtally TABLE (byte tinyint, hw int);
INSERT INTO @hwtally (byte, hw) VALUES (0, 0);
INSERT INTO @hwtally (byte, hw) SELECT   1 - byte, 1 - hw FROM @hwtally;
INSERT INTO @hwtally (byte, hw) SELECT   3 - byte, 2 - hw FROM @hwtally;
INSERT INTO @hwtally (byte, hw) SELECT   7 - byte, 3 - hw FROM @hwtally;
INSERT INTO @hwtally (byte, hw) SELECT  15 - byte, 4 - hw FROM @hwtally;
INSERT INTO @hwtally (byte, hw) SELECT  31 - byte, 5 - hw FROM @hwtally;
INSERT INTO @hwtally (byte, hw) SELECT  63 - byte, 6 - hw FROM @hwtally;
INSERT INTO @hwtally (byte, hw) SELECT 127 - byte, 7 - hw FROM @hwtally;
INSERT INTO @hwtally (byte, hw) SELECT 255 - byte, 8 - hw FROM @hwtally;

-- calculate
WITH split AS (
  SELECT SUBSTRING(@value, number, 1) AS byte
  FROM master.dbo.spt_values
  WHERE type = 'P' AND number BETWEEN 1 AND LEN(@value)
)
SELECT
  Value = @value,
  HammingWeight = SUM(t.hw)
FROM split s
  INNER JOIN @hwtally t ON s.byte = t.byte
Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • Perfect! Thanks. Didn't know about spt_values before. – Simon May 09 '11 at 11:42
  • @Simon: Here's some useful info for a start: http://stackoverflow.com/questions/4273723/what-is-the-purpose-of-system-table-table-master-spt-values-and-what-are-the-mea – Andriy M May 09 '11 at 11:54
2

When you are playing with smaller value (something like 16 bit max), The most efficient way to do it with SQL Server is using an Table with all result calculated and using a join.

I have speed up a query from 30 sec to 0 sec by doing this kind of thing on a query which should calculate Hamming Weight of a 4 bit value on 17'000 rows .

WITH HammingWeightHelper AS (
        SELECT  x, Fx 
        FROM (VALUES(0,0),(1,1),(2,1),(3,2),
                    (4,1),(5,2),(6,2),(7,3),
                    (8,1),(9,2),(10,2),(11,3),
                    (12,2),(13,3),(14,3),(15,4)) AS HammingWeight(x, Fx)
    )
SELECT HammingWeight.Fx As HammingWeight, SomeTable.Value As bitField
FROM   SomeTable INNER JOIN
       HammingWeightHelper ON HammingWeightHelper.x = SomeTable.Value 

Of course it is an ugly solution and it probably won't suit well for long bit field.

Marco Guignard
  • 613
  • 3
  • 9
1

SQL Server, as of SQL Server 2022 CTP 2.1, supports BIT_COUNT(). The documentation is here.

Conor Cunningham MSFT
  • 4,151
  • 1
  • 15
  • 21
0

Didn't find anything specifically about hamming weight, but here's one for hamming distance:

create function HamDist(@value1 char(8000), @value2 char(8000))
returns int
as
begin
    declare @distance int
    declare @i int
    declare @len int

    select @distance = 0,
           @i =1,
           @len = case when len(@value1) > len(@value2)
                       then len(@value1)
                       else len(@value2) end

    if (@value1 is null) or (@value2 is null)
        return null

    while (@i <= @len)
        select @distance = @distance +
                           case when substring(@value1,@i,1) != substring(@value2,@i,1)
                                then 1
                                else 0 end,
               @i = @i +1

    return @distance
end

This computes the hamming distance between two values. The hamming weight of a single value would be the hamming distance between that value and an array of zero-values.

KeithS
  • 70,210
  • 21
  • 112
  • 164
  • Thanks for the reply. It's the same algorithm that @Lamak already posted. However the implementation is not exactly fast. I'd have to convert the field to a CHAR first (similar to http://support.microsoft.com/kb/104829) and then call this routine. Isn't there something that computes the hamming distance at least byte-wise? – Simon May 06 '11 at 20:15
0

I couldn't find a good way to do it. In the end I calculated the hamming weight in Java and periodically update the bit counts in the database.

Simon
  • 1,814
  • 20
  • 37