Preambule
I have MSSQL 2016 instance with CDC enabled. And I stream changes by Debezium kafka connector. Connector in logs and topics refer Log Sequence Number (LSN)
in form of 000000aa:af9c3f2d:0f5e1
. In MSSQL I have tables like cdc.dbo_Entity_cdc_CT
where present columns __$start_lsn
, __$start_lsn
, __$end_lsn
but it has type varbinary(10)
.
I would match values.
Based mostly on information in Day 11 of 31 Days of Disaster: Converting LSN Formats article which give hint it should be converted in chain convert to binary style 1 -> bigint, I wrote helper functions for that: https://github.com/Hubbitus/sql/commit/4ccf96492f2aba9796ccaecb1b74e9ba53a45895#diff-e495aa591f98a29c28153735379ad6fd
It mostly works. Contains descriptions and even testing conversion doing right.
But there is one problem I can't solve now.
8 byte part exceeded bigint
When I convert input varbinary(16)
value, I split it into two parts: 2 and 8 bytes length, each convert into bigint
, and then into varchar
, f.e.:
DECLARE @lsn VARBINARY(10) ...
-- Fill by \0 at left
SET @lsn = CONVERT(varbinary, REPLICATE(char(0), 10 - LEN(@lsn))) + @lsn
DECLARE @lsn_s varchar(max) = CONVERT(varchar, CONVERT(bigint, SUBSTRING(@lsn, 1, 2), 1)) + CONVERT(varchar, CONVERT(bigint, SUBSTRING(@lsn, 3, 8), 1))
It works in most cases (f.e. for test table I see only 4 records with error), f.e. for value like:
DECLARE @lsn varbinary(10) = CONVERT(varbinary(10), '0x025D020F35B80001', 1)
You may check it like:
SELECT dbo.__tmp_td_cdc_decode_check(CONVERT(varbinary(10), '0x025D020F35B80001', 1))
But fails, said on:
DECLARE @lsn varbinary(10) = CONVERT(varbinary(10), '0x0043275D010200440020', 1)
SELECT dbo.__tmp_td_cdc_decode_check(@sql)
Indeed, MSSQL have not unsigned datatypes.
The main question
I do not understand how to work around this problem in line:
DECLARE @lsn_s varchar(max) = CONVERT(varchar, CONVERT(bigint, SUBSTRING(@lsn, 1, 2), 1)) + CONVERT(varchar, CONVERT(bigint, SUBSTRING(@lsn, 3, 8), 1))
How I should correctly transform varbinary(10)
into varchar
in form of decimal digits for any cases?
What I also do not understand, sometimes I got error "data truncation" on conversions, but sometimes it happened silently. Please look:
DECLARE @lsn varbinary(10) = CONVERT(varbinary(10), '0x0043275D010200440020', 1) -- Bigint overflow
SELECT
dbo.__tmp_td_cdc_decode_check(@lsn) as ToCheck
,CONVERT(varchar, @lsn, 1) as orig_lsn_hex
,CONVERT(bigint, @lsn, 1) as orig_lsn_bigint -- In many cases it will lead overflow!
,dbo.__tmp_td_cdc_binary_decode_to_string(@lsn) as string
,dbo.__tmp_td_cdc_string_decode_to_binary_DECSTRING(dbo.__tmp_td_cdc_binary_decode_to_string(@lsn)) as decimal_string_reverse
Output:
┌─────────┬────────────────────────┬─────────────────────┬─────────────────────────┬────────────────────────┐
│ ToCheck │ orig_lsn_hex │ orig_lsn_bigint │ string │ decimal_string_reverse │
├─────────┼────────────────────────┼─────────────────────┼─────────────────────────┼────────────────────────┤
│ Error │ 0x0043275D010200440020 │ 2836424448419299360 │ 000012e6:2f707ba4:02580 │ 4838079590083609600 │
└─────────┴────────────────────────┴─────────────────────┴─────────────────────────┴────────────────────────┘
Hexadecimal value 0x0043275D010200440020
should be 1_238_768_277_386_959_257_632 and not 2_836_424_448_419_299_360 what I get from CONVERT(bigint, @lsn, 1)
P.S. Also interesting and helpful materials: