1

This function is a workaround ... nothing with better performance?

CREATE or REPLACE FUNCTION varbit_to_int(v varbit) RETURNS int AS $f$
  SELECT CASE bit_length(v)
    WHEN 1 THEN v::bit(1)::int
    WHEN 2 THEN v::bit(2)::int
    WHEN 3 THEN v::bit(3)::int
    ...
    WHEN 30 THEN v::bit(30)::int
    WHEN 31 THEN v::bit(31)::int
    WHEN 32 THEN v::bit(32)::int
    ELSE NULL::int
  END
$f$ LANGUAGE SQL IMMUTABLE;

Same problem for bigint:

CREATE or replace FUNCTION varbit_to_bigint(p varbit) RETURNS bigint AS $f$
  SELECT CASE bit_length($1)
    WHEN 1 THEN $1::bit(1)::bigint
    WHEN 2 THEN $1::bit(2)::bigint
    WHEN 3 THEN $1::bit(3)::bigint
    ...
    WHEN 62 THEN $1::bit(62)::bigint
    WHEN 63 THEN $1::bit(63)::bigint
    WHEN 64 THEN $1::bit(64)::bigint
    ELSE NULL::bigint
  END
$f$ LANGUAGE SQL IMMUTABLE STRICT;

Using many times in loops seems CPU-wasteful, only to avoid "cannot cast type bit varying to integer" error. Maybe an external C-language library do this and other useful castings.


NOTICE select b'101'::bit(64)::bigint != b'101'::bigint;

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304

1 Answers1

2

I tested a couple of variants (for bigint only) with built-in functionality and this variant with OVERLAY() turned out fastest in my local tests on Postgres 11:

CREATE OR REPLACE FUNCTION varbit2bigint2(b varbit)
  RETURNS bigint AS
$func$
SELECT OVERLAY(bit(64) '0' PLACING b FROM 65 - bit_length(b))::bigint
$func$  LANGUAGE SQL IMMUTABLE;

Other candidates:

Note the different conversion of empty bitstrings ('') to 0 vs. NULL. Adapt to your needs!

Your function:

CREATE OR REPLACE FUNCTION varbit2bigint1(b varbit)
  RETURNS bigint AS
$func$
  SELECT CASE bit_length($1)
   WHEN  1 THEN $1::bit(1)::bigint
   WHEN  2 THEN $1::bit(2)::bigint
   WHEN  3 THEN $1::bit(3)::bigint
   WHEN  4 THEN $1::bit(4)::bigint
   WHEN  5 THEN $1::bit(5)::bigint
   WHEN  6 THEN $1::bit(6)::bigint
   WHEN  7 THEN $1::bit(7)::bigint
   WHEN  8 THEN $1::bit(8)::bigint
   WHEN  9 THEN $1::bit(9)::bigint
   WHEN 10 THEN $1::bit(10)::bigint
   WHEN 11 THEN $1::bit(11)::bigint
   WHEN 12 THEN $1::bit(12)::bigint
   WHEN 13 THEN $1::bit(13)::bigint
   WHEN 14 THEN $1::bit(14)::bigint
   WHEN 15 THEN $1::bit(15)::bigint
   WHEN 16 THEN $1::bit(16)::bigint
   WHEN 17 THEN $1::bit(17)::bigint
   WHEN 18 THEN $1::bit(18)::bigint
   WHEN 19 THEN $1::bit(19)::bigint
   WHEN 20 THEN $1::bit(20)::bigint
   WHEN 21 THEN $1::bit(21)::bigint
   WHEN 22 THEN $1::bit(22)::bigint
   WHEN 23 THEN $1::bit(23)::bigint
   WHEN 24 THEN $1::bit(24)::bigint
   WHEN 25 THEN $1::bit(25)::bigint
   WHEN 26 THEN $1::bit(26)::bigint
   WHEN 27 THEN $1::bit(27)::bigint
   WHEN 28 THEN $1::bit(28)::bigint
   WHEN 29 THEN $1::bit(29)::bigint
   WHEN 30 THEN $1::bit(30)::bigint
   WHEN 31 THEN $1::bit(31)::bigint
   WHEN 32 THEN $1::bit(32)::bigint
   WHEN 33 THEN $1::bit(33)::bigint
   WHEN 34 THEN $1::bit(34)::bigint
   WHEN 35 THEN $1::bit(35)::bigint
   WHEN 36 THEN $1::bit(36)::bigint
   WHEN 37 THEN $1::bit(37)::bigint
   WHEN 38 THEN $1::bit(38)::bigint
   WHEN 39 THEN $1::bit(39)::bigint
   WHEN 40 THEN $1::bit(40)::bigint
   WHEN 41 THEN $1::bit(41)::bigint
   WHEN 42 THEN $1::bit(42)::bigint
   WHEN 43 THEN $1::bit(43)::bigint
   WHEN 44 THEN $1::bit(44)::bigint
   WHEN 45 THEN $1::bit(45)::bigint
   WHEN 46 THEN $1::bit(46)::bigint
   WHEN 47 THEN $1::bit(47)::bigint
   WHEN 48 THEN $1::bit(48)::bigint
   WHEN 49 THEN $1::bit(49)::bigint
   WHEN 50 THEN $1::bit(50)::bigint
   WHEN 51 THEN $1::bit(51)::bigint
   WHEN 52 THEN $1::bit(52)::bigint
   WHEN 53 THEN $1::bit(53)::bigint
   WHEN 54 THEN $1::bit(54)::bigint
   WHEN 55 THEN $1::bit(55)::bigint
   WHEN 56 THEN $1::bit(56)::bigint
   WHEN 57 THEN $1::bit(57)::bigint
   WHEN 58 THEN $1::bit(58)::bigint
   WHEN 59 THEN $1::bit(59)::bigint
   WHEN 60 THEN $1::bit(60)::bigint
   WHEN 61 THEN $1::bit(61)::bigint
   WHEN 62 THEN $1::bit(62)::bigint
   WHEN 63 THEN $1::bit(63)::bigint
   WHEN 64 THEN $1::bit(64)::bigint
   ELSE NULL::bigint
  END
$func$  LANGUAGE SQL IMMUTABLE;  -- no STRICT modifier

Left-padding the text representation with '0':

CREATE OR REPLACE FUNCTION pg_temp.varbit2bigint3(b varbit)
  RETURNS bigint AS
$func$
SELECT lpad(b::text, 64, '0')::bit(64)::bigint
$func$  LANGUAGE SQL IMMUTABLE;

Bit-shifting before the cast:

CREATE OR REPLACE FUNCTION varbit2bigint4(b varbit)
  RETURNS bigint AS
$func$
SELECT (bit(64) '0' || b << bit_length(b))::bit(64)::bigint
$func$  LANGUAGE SQL IMMUTABLE;

db<>fiddle here

Related:

Your feedback

It is not worst, it is faster!

EXPLAIN ANALYZE select 
  varbit_to_bigint(osm_id::bit(64)::varbit) 
from planet_osm_point limit 10000 ;
-- Planning time: 0.697 ms
-- Execution time: 1133.571 ms

EXPLAIN ANALYZE select 
  lpad(osm_id::bit(64)::varbit::text, 32, '0')::bit(64)::bigint 
from planet_osm_point limit 10000;
-- Planning time: 0.105 ms
-- Execution time: 26.429 ms

You show a STRICT modifier with the bigint variant of the function in the question (not sure why it differs from the integer variant). If that represents the function you actually tested, I expect most of the observed performance difference is due to that added STRICT modifier preventing function inlining. Quoting the Postgres Wiki:

if the function is declared STRICT, then the planner must be able to prove that the body expression necessarily returns NULL if any parameter is null. At present, this condition is only satisfied if: every parameter is referenced at least once, and all functions, operators and other constructs used in the body are themselves STRICT.

That seems to hurt your function badly - while my winner seems unaffected, and the other two variants are even ~ 10 % faster. Same fiddle with STRICT functions:

db<>fiddle here

Related:

I suggest you re-test with and without STRICT modifier to see for yourself.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Sorry, I commented "seems CPU-wasteful (...) external C-language" but only requested "better" in the introduction, it is vague. I am editing to "better performance". PS: your solution is the worst in terms of performance, but is elegant...A C-language implementation of `lpad(varbit)` perhaps will be suficient for better performance. – Peter Krauss May 13 '19 at 21:34
  • Yes, SCRICT is horrible (!), I used only to avoid NULLs... I remove `strict` and now Execution times are near the same: 3480.135 ms (lpad) vs 3547.097 ms. The big difference is about planning: 1.420 ms vs 0.141 ms. – Peter Krauss May 14 '19 at 00:45
  • @Peter: Have you considered my new variant with `OVERLAY()`? Should be considerably faster than either. (Also, 1 ms of planning pales in comparison with 3500 ms of execution, doesn't it? If planning time matters, consider a plpgsql variant ...) – Erwin Brandstetter May 14 '19 at 00:48
  • Perfect (!), your bit-shifting `varbit2bigint4()` function is the faster, and OVERLAY have near same performance. – Peter Krauss May 14 '19 at 01:00