SQL - Get value for a specific element in array

Question

I asked this question few days ago, but it involves a bit deeper answer so it was suggested I create a brand new one, so here it goes...

Disclaimer: I cannot create any custom DB objects (functions, SP's, views etc.), so everything needs to be in-line inside a SQL query.

I'm querying Audit table which for the simplicity of this question has following fields:

AttributeMask
ChangedData
CreatedOn
ObjectId

Each record in a DB may have multiple Audit records associated with it. Every time a change is made to a DB record, it will create a record in the Audit table with specific ObjectID that will point to the source record, CreatedOn that will have a DateTime of the change, AttributeMask with list of AttributeId's that have been changed when SAVE was executed (note, there may be multiple fields changed at once) and ChangedData will actually have the data that's been changed (pre-changed values). One field can of course be changed multiple times and if it's the case, multiple Audit records for this field will exist (different CreatedOn values). I need to find what some (not all) fields from the source record looked like at a specific date.

I can run query below:

select a1.ChangeData as ChangedData1, a1.AttributeMask as AttributeMask2, a2.ChangeData as ChangedData2, a2.AttributeMask as AttributeMask2
from Table1 t
join audit a1 on a1.AuditId = 
(select top 1 a.auditid from audit a where a.objecttypecode = 3
and a.objectid = T.ObjectId
and a.AttributeMask like '%,10192,%'
and a.CreatedOn <= '8-16-2018'
order by a1.CreatedOn desc)
join audit a2 on a2.AuditId = 
(select top 1 a.auditid from audit a where a.objecttypecode = 3
and a.objectid = T.ObjectId
and a.AttributeMask like '%,10501,%'
and a.CreatedOn <= '8-16-2018'
order by a1.CreatedOn desc)
where t.ObjectID = SomeGuidValue

This query is looking for the latest change to 2 fields (10192 and 10501) which happened before 8-16-2018. It returns the following data (I added 3rd record to illustrate all possible cases):

ChangeData1 AttributeMask1  ChangeData2 AttributeMask2
NULL    NULL    True~~True~1904~~~15.8700000000~4760~30000~590~12000~0~390~1904~False~200~  ,10499,10604,10501,10436,10491,10490,10459,10099,10319,10253,10433,10031,10091,10020,10265,10008,10509,
~True~5.56~~House~~200000~  ,10030,10432,10435,197,10099,10192,198, False~1170~600~0~Complete~True~1770~    ,10501,10091,10008,10020,10570,10499,10253,10715,
~~~~200001~ ,10432,10435,197,10099,10192,198,   True~2~True~~0~~~100.0000000000~1~business,96838c4f-e63c-e011-9a14-78e7d1644f78~~0~~~~0~False~~1~   ,10499,10509,10501,10203,10436,10491,10490,10459,10099,10157,10253,10433,10715,10031,10091,10020,10265,10008,10319,10699,

This means that 1st record has change to field 10501 only, 2nd record has change to 10192 only and 3rd record has changes to both 10192 and 10501 fields.

AttributeMask field has comma delimited list of all FieldID's that have been changed (note that it starts and ends with comma). ChangedData field has tilde (~) delimited list of data that's been changed. Each entry in AttributeMask corresponds to entry in ChangedData. For example, if I wanted to see what data was in 10501 field in 1st record, I would need to determine what entry # 10501 is in AttributeMask field (it's #3 in the list) and then I would need to find out what data is in entry #3 in ChangedData field (it's TRUE) and if I wanted to see what was in 2nd record for Field 10192 I'd see what index it has in AttributeMask (it's #6) and its corresponding value in ChangedData field is 2000000. I need to somehow extract this data in the same query. I was helped with some samples on how this could be done, but I failed to ask the right question in the beginning (thought it would be simpler than explaining all this).

What I need this query to return is something like this:

ChangeData1 AttributeMask1  ChangeData2 AttributeMask2
NULL    NULL    TRUE    10501
200000  10192   FALSE   10501
200001  10192   TRUE    10501

I hope this is clear now.

Looks like you were really bored writing in last 2 paragraphs. Why would #3 need to be decoded getting #6 in ChangedData? Even in that case why are you taking #5 as #6 for example? Anyway, this looks like an awful design, I would try dealing with it at client side rather than trying to work with T-SQL (maybe a CLR function but you are saying you can't do anything there - although it is also done via T-SQL) — Cetin Basoz, Aug 16 '18 at 17:50
Hi @Cetin Basoz, you'e right (not about me being bored). I made a mistake typing this up. I corrected it. The design isn't all that awful, this just wasn't designed to be parsed by T-SQL. It's very easy to parse in in any computer language (C# for example). I can only create a query. I cannot create anything else (functions, SP's, views) in this DB. — Tony, Aug 16 '18 at 18:02
What should happen, if `AttributeMask1` included both values (same for `2`)? How many joins are the maximum? *Name numbering* is always a hint for a bad design. Why dont you return `Data` and `Mask` together with a row counter in a list? If this list might get broader and broader this is getting really weird... — Shnugo, Aug 16 '18 at 20:12
Hi @Shnugo if AttributeMask1 includes both values, it's fine because I only care about 1 join at a time, so I'll just pull value for that specific join and the other value will be part of another join. So it's possible to pull the same Audit record multiple times by different joins. I'm not sure about maximum # of joins, but I'd say about 15. I wasn't going to name-number in a real query. I was going to make the names more explanatory. I'm not sure how returning row counter is going to solve my issue? — Tony, Aug 16 '18 at 23:55

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

As told in my comments you are better off to deal with a set, then working with a broader and broader list with name-numbered columns.

Try to provide your initial input set in the format like the following mockup table:

There is a running ID, your ObjectID, the code you are looking for and the both strings. I inserted the data as provided by you, but not side-by-side:

DECLARE @tbl TABLE(ID INT IDENTITY, CodeId INT,ObjectId INT,  ChangeData VARCHAR(1000), AttributeMask VARCHAR(1000));
INSERT INTO @tbl VALUES
 (10192,1,NULL,NULL)
    ,(10501,1,'True~~True~1904~~~15.8700000000~4760~30000~590~12000~0~390~1904~False~200~',',10499,10604,10501,10436,10491,10490,10459,10099,10319,10253,10433,10031,10091,10020,10265,10008,10509,')
,(10192,2,'~True~5.56~~House~~200000~',',10030,10432,10435,197,10099,10192,198,')
   ,(10501,2,'False~1170~600~0~Complete~True~1770~',',10501,10091,10008,10020,10570,10499,10253,10715,')
,(10192,3, '~~~~200001~',',10432,10435,197,10099,10192,198,')
   ,(10501,3,'True~2~True~~0~~~100.0000000000~1~business,96838c4f-e63c-e011-9a14-78e7d1644f78~~0~~~~0~False~~1~',',10499,10509,10501,10203,10436,10491,10490,10459,10099,10157,10253,10433,10715,10031,10091,10020,10265,10008,10319,10699,');

--The query will cast the strings to XML in order to grab into it by their position index
--Then all codes are taken and numbered as derived list.
--According to the found position the corresponding value is taken

SELECT t.ID 
      ,t.ObjectId
      ,t.CodeId
      ,t.ChangeData
      ,t.AttributeMask
      ,Casted.ValueXml.value('/x[sql:column("PartIndex")][1]','nvarchar(max)') ValueAtCode  
FROM @tbl t
CROSS APPLY
(
    SELECT CAST('<x>' + REPLACE(t.AttributeMask,',','</x><x>') + '</x>' AS XML).query('/x[text()]') AS CodeXml
          ,CAST('<x>' + REPLACE(t.ChangeData,'~','</x><x>') + '</x>' AS XML) AS ValueXml
) Casted
CROSS APPLY(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PartIndex
          ,x.value('text()[1]','nvarchar(max)') AS CodePart
    FROM Casted.CodeXml.nodes('/x') A(x)
) CodeDerived
WHERE CodeDerived.CodePart=t.CodeId;

the result

ID  ObjectId    CodeId  ValueAtCode
2   1           10501   True
3   2           10192   
4   2           10501   False
5   3           10192   200001
6   3           10501   True

But this will be slooooow...

UPDATE

Your whole approach is not set-based. The following is completely untested, I don't have your database, but will point to a set-based solution.

DECLARE @Codes TABLE(CodeID INT);
INSERT INTO @Codes VALUES(10192),(10501);

select t.SomeIdOfYourMainTable
      ,c.CodeID
      ,a1.ChangeData
      ,a1.AttributeMask
from Table1 t
CROSS JOIN @Codes c --will repeat the result for each value in @Codes
CROSS APPLY
(
    select top 1 a.ChangeData
                ,a.AttributeMask             
    from [audit] a 
    where a.objecttypecode = 3
    and a.objectid = t.ObjectId
    and a.AttributeMask like CONCAT('%,',c.CodeID,',%')
    and a.CreatedOn <= '20180816' --use culture independant format!!!
    order by a.CreatedOn desc
) a1;

This allows you to insert as many codes as you want (no need to repeat any join) and it will return a set similar to my example above.

If you need further help with this: Please close this question and start a new question with a fully working, stand-alone MCVE to reproduce your case.

Hi @Shnugo, this seems to work but I cannot figure out how :-) Where exactly did you indicate what Field ID's to look for? It's coming back only with 10501 and 10192, but where exactly do you have that in your query? I'm asking because I'll need to look for about 10 more Field ID's. You're right, it might be slow but I need to test its performance on a larger dataset. I mean getting something slow beats getting nothing fast :-) Really appreciate your help man! — Tony, Aug 17 '18 at 18:24
@Tony, Just execute a simple `SELECT * FROM @tbl`. The code you see in the column `CodeId` is the one taken. There will be a numbered list of all codes of each single row (`APPLY` works *row-wise*), while the final `WHERE` reduces this list to the one line we are looking for. This line carries a position, which is used to pick the correct element from the other string. — Shnugo, Aug 17 '18 at 21:19
Hi @Shnugo, this is very nice but way too complicated for me. I see that you added CodeId column to the dataset and I understand this how the query is looking for the right values, but my dataset doesn't have this column. I would need to hardcode the CodeId's (actually FieldId's) values right in the query. Is this possible? I realize that I may need to do multiple joins (for each FieldId). — Tony, Aug 20 '18 at 17:34
@Tony If I get this correctly, your approach is much more complicated actually. See my update... — Shnugo, Aug 21 '18 at 12:28
Hi @Shnugo, your 2nd sample works, but it doesn't return a single value from ChangedData column which is what I'm after. If we could somehow combine your 1st sample with the 2nd, it would be perfect. I suppose I could dump the output from your 2nd sample into a temp table and then apply your 1st approach to it. The temp table you defined in your previous approach is pretty much what I have (sans CodeID column plus CreatedOn column). — Tony, Aug 21 '18 at 16:43
@Tony, it should be enough to add the two `CROSS APPLYs` to the second block... They are responsible for the matching. The final `WHERE` must check against `c.CodeId` — Shnugo, Aug 21 '18 at 16:47
@Snhugo, BINGO! I of course will need to test this against all possible scenarios but it works on a simple dataset without too many variations. Really appreciate your help. You're the best! — Tony, Aug 21 '18 at 17:00

SQL - Get value for a specific element in array

1 Answers1

UPDATE

Linked