3

I have two tables

Table A (about 2 million rows)

ID    Str
123   1=A&3=B&8=B&11=A&12=R&17=BX&19=Z&20=B&21=AK&22=AH&24=A&27=A&28=A&29=C&31=A&32=S&33=N&34=H&35=Z&37=A&38=Z&39=A&41=D&42=G&49=A&52=Z&53=C&54=D&55=Z&56=Z&57=Z&58=A&61=B&63=B&65=A&66=A&67=A&68=A&69=A&71=G&73=Q&74=AB&76=B&77=Z&79=F&80=B&82=D&83=B&89=BA&91=Z&92=Z&93=Z&95=R&97=B&98=Z&99=G&100=G&101=B&106=C&109=Z&110=A&111=Z&112=C&116=B&118=Z&120=B&121=A&122=A&124=E&125=Z&128=A&131=Z&137=F&141=A&142=B&143=Z&144=B&146=C&148=Z&153=A&157=A&164=A&167=Z&168=Z&170=C&188=E&200=A&203=B&227=Z&235=C&237=B&238=G&258=Z&259=Z&268=Z&272=B&274=D&284=B&285=C&288=A&289=Z&290=A&299=M&300=D&313=Z&315=Z&322=A&324=Z&327=Z&358=E&360=B&365=A&366=A&375=B&376=B&379=Z&381=Z&383=M&387=Z&390=Z&391=Z&392=B&393=Z&398=MA&400=Z&403=B&404=A&407=B&408=BT&410=B&412=A&413=F&414=B&415=F&416=B&419=Z&421=Z&430=C&433=Z&435=C&440=Z&441=PG&448=A&453=A&458=B&459=Z&460=Z&463=A&466=Z&468=Z&472=Z&474=A&475=Z&492=Z&496=Z&498=A&504=A&506=B&507=Z&514=D&518=B&519=Z&523=C&536=A&554=Z&557=Z&560=A&582=Z&584=A&586=Z&587=Z&588=B&591=C&599=A&603=Z&610=Z&614=B&625=A&664=A&666=C&668=C&670=Z&672=L&674=L&698=B&699=A&720=BB&745=Z&754=A&756=Z&759=Z&762=Z&766=AA&767=AA&769=A&772=HB&785=B&787=Z&788=Z&789=Z&790=Z&791=Z&792=Z&794=Z&795=A&796=Z&803=A&808=Z&809=DC&812=G&813=G&815=G&821=K&822=M&824=ES&828=Z&829=GA&831=SN&837=BC&841=K&845=K&851=S&852=Z&853=B&854=WU&855=B&856=WU&857=A&859=A&861=B&863=C&865=A&867=B&869=Z&873=ZA&882=CM&889=CD&921=Z&922=Z&924=Z&925=Z&932=B&933=Z&941=Z&952=RJ&954=Z&967=P&984=A&985=H&1002=D&1007=F&1041=F&1052=BR&1053=BR&1056=PD&1057=NA&1095=H&1104=B&1121=A&1122=M&1123=D&1162=G&1163=A&1164=B&1165=A&1166=E&1174=A&1182=A&1183=A&1185=C&1188=Z&1202=KG&1211=A&1223=B&1229=Z&1231=A&1233=Z&1237=L&1256=TW&1258=C&1259=BH&1260=D&1261=A&1272=C&1296=Z&1298=Z&1299=A&1302=A&1303=A&1308=C&1309=J&1313=A&1316=B&1317=Z&1318=A&1319=Z&1323=D&1327=A&1328=A&1329=Z&1330=Z&1331=A&1332=Z&1333=Z&1335=DS&1339=Z&1345=AE&1346=AG&1352=Z&1356=Z&1366=Z&1367=B&1368=B&1369=A&1370=W&1371=A&1373=Z&1379=GQ&1381=A&1387=CX&1388=Z&1389=EU&1391=E&1392=J&1397=Z&1399=Z&1401=Z&1405=A&1406=CN&1412=Z&1416=Z&1417=B&1419=Z&1421=Z&1424=Z&1428=Z&1431=B&1432=B&1436=A&1437=Z&1441=CB&1442=A&1443=Z&1449=A&1452=Z&1453=Z&1470=A&1473=Z&1474=A&1475=Z&1477=D&1478=B&1479=B&1481=A&1488=A&1490=D&1493=E&1494=Z&1502=Y&1503=BH&1504=C&1505=B&1506=Z&1511=Z&1512=A&1513=KU&1518=A&1524=Z&1541=A&1549=A&1554=A&1623=A&1629=Z&1637=B&1640=Z&1642=Z&1643=Z&1644=B&1661=A&1673=K&1681=A&1771=A&1772=Z&1787=Z&1788=Z&1790=Z&1793=Z&1837=A&1838=Z&1839=Z&1854=Z&1855=Z&1856=Z&1858=Z&1867=B&1872=B&1873=B&1879=Z&1881=B&1902=Z&1910=Z&1916=B&1917=A&1918=B&1920=A&1927=A&1928=Z&1963=Z&1964=Z&1968=A&1974=A&1975=Z&1976=Z&1993=Z&2007=A&2010=Z&2011=A&2012=Z&2015=Z&2022=Z&2023=Z&2024=Z&2032=Z&2037=Z&MPC-9002=AA&MPC-9006=AG&
165   1=A&2=R&4=B&33=G&34=Q&46=Y&49=A&75=AH&589=C&590=A&803=IG&812=RB&813=RB&814=RB&815=RB&841=V&843=V&844=V&845=V&851=S&853=B&854=KJ&855=B&856=KJ&857=B&858=KJ&859=B&860=KJ&861=B&862=RB&863=B&864=RB&865=B&866=RB&867=B&881=AC&883=GE&889=M&897=IS&922=C&965=CP&1323=C&

Table B (about 50000 rows)

ID    Str
1     1=A&1323=D!1=B&1323=D!1=A&1323=E!1=F&1323=D!1=B&1323=E!1=F&1323=E
2     1=B&142=D&1323=D&1470=B
3     1=A&1323=E&1470=C

The query I am using is this

SELECT A.ID, B.ID
FROM TableA AS A
CROSS APPLY TableB AS B
WHERE function(A.Str,B.Str) = 1

The code for the function is written in C#:

public static SqlByte function(SqlString equip, SqlString comp)
{
    SqlByte result = 1;

    if (comp.IsNull)
    {
        result = 1;
    }
    else
    {
        var eq = new List<string>(equip.ToString().Split('&'));
        var compString = new List<string>(comp.ToString().Split('!'));

        foreach (string com in compString)
        {
            foreach (string c in com.Split('&'))
            {
                if (c.StartsWith("~"))
                {
                    if (eq.Contains(c.Substring(1)))
                    {
                        result = 0;
                        break;
                    }
                }
                else if (!eq.Contains(c))
                {
                    result = 0;
                    break;
                }
                else
                {
                    result = 1;
                    continue;
                }
            }
            if (result == 1)
            {
                break;
            }
        }

    }
    return result;
}

With the given amount of data it takes me about 6 days to run the query. Is there some trick I can do to fasten things up?

Many thanks!

jigga
  • 558
  • 4
  • 16
  • You can use `Tasks` or `Parallel.ForEach`, [example](https://stackoverflow.com/questions/12251874/how-can-i-convert-this-foreach-code-to-parallel-foreach) – Cizzl Feb 08 '19 at 12:44
  • 2
    I'm not sure what it's purpose is, but you are creating and checking 100,000,000,000 (one hundred billion) strings - could you break the strings up into rows in a relational table, then use some other sort of SQL to find the matches that you seek? – Cato Feb 08 '19 at 12:53
  • Did you consider to use [CTE](https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017) on server side instead of C# function? – Maciej Los Feb 08 '19 at 13:35
  • @MaciejLos tried that but it took even longer to accomplish – jigga Feb 08 '19 at 14:06
  • @Cato also tried that but then you have even more combinations of rows, so it took longer – jigga Feb 08 '19 at 14:09
  • 2
    The **best** way to improve performance here is to stop storing delimited values. It violates 1NF and causes all sorts of problems. – Sean Lange Feb 08 '19 at 14:18
  • @SeanLange it's a table which is not made by me... but I have to use it :| – jigga Feb 08 '19 at 14:39
  • 1
    I would find a way to parse that mess you have been handed into a persistent table so you can query it. I feel your pain that you can't fix that table but no matter how you slice it this is going to suck for performance in its current state. It has 2 levels of delimiters which is basically shoving an entire table into a single tuple. The idea of a persistent table is to allow the splitting to happen when the data hits your table so you don't have to do it all the time. It is not the best but would be better for performance. – Sean Lange Feb 08 '19 at 14:44

3 Answers3

3

As you are on SQL-Server 2017 I'd recommend an approach via JSON. One big advantage is, that OPENJSON will return the fragment's position, which allows to re-concatenate this - if needed:

DECLARE @tblA TABLE(ID INT,YourString VARCHAR(MAX));
INSERT INTO @tblA VALUES
 (123,'1=A&3=B&8=B&11=A&12=R&17=BX&19=Z&20=B&21=AK&22=AH&24=A&27=A&28=A&29=C&31=A&32=S&33=N&34=H&35=Z&37=A&38=Z&39=A&41=D&42=G&49=A&52=Z&53=C&54=D&55=Z&56=Z&57=Z&58=A&61=B&63=B&65=A&66=A&67=A&68=A&69=A&71=G&73=Q&74=AB&76=B&77=Z&79=F&80=B&82=D&83=B&89=BA&91=Z&92=Z&93=Z&95=R&97=B&98=Z&99=G&100=G&101=B&106=C&109=Z&110=A&111=Z&112=C&116=B&118=Z&120=B&121=A&122=A&124=E&125=Z&128=A&131=Z&137=F&141=A&142=B&143=Z&144=B&146=C&148=Z&153=A&157=A&164=A&167=Z&168=Z&170=C&188=E&200=A&203=B&227=Z&235=C&237=B&238=G&258=Z&259=Z&268=Z&272=B&274=D&284=B&285=C&288=A&289=Z&290=A&299=M&300=D&313=Z&315=Z&322=A&324=Z&327=Z&358=E&360=B&365=A&366=A&375=B&376=B&379=Z&381=Z&383=M&387=Z&390=Z&391=Z&392=B&393=Z&398=MA&400=Z&403=B&404=A&407=B&408=BT&410=B&412=A&413=F&414=B&415=F&416=B&419=Z&421=Z&430=C&433=Z&435=C&440=Z&441=PG&448=A&453=A&458=B&459=Z&460=Z&463=A&466=Z&468=Z&472=Z&474=A&475=Z&492=Z&496=Z&498=A&504=A&506=B&507=Z&514=D&518=B&519=Z&523=C&536=A&554=Z&557=Z&560=A&582=Z&584=A&586=Z&587=Z&588=B&591=C&599=A&603=Z&610=Z&614=B&625=A&664=A&666=C&668=C&670=Z&672=L&674=L&698=B&699=A&720=BB&745=Z&754=A&756=Z&759=Z&762=Z&766=AA&767=AA&769=A&772=HB&785=B&787=Z&788=Z&789=Z&790=Z&791=Z&792=Z&794=Z&795=A&796=Z&803=A&808=Z&809=DC&812=G&813=G&815=G&821=K&822=M&824=ES&828=Z&829=GA&831=SN&837=BC&841=K&845=K&851=S&852=Z&853=B&854=WU&855=B&856=WU&857=A&859=A&861=B&863=C&865=A&867=B&869=Z&873=ZA&882=CM&889=CD&921=Z&922=Z&924=Z&925=Z&932=B&933=Z&941=Z&952=RJ&954=Z&967=P&984=A&985=H&1002=D&1007=F&1041=F&1052=BR&1053=BR&1056=PD&1057=NA&1095=H&1104=B&1121=A&1122=M&1123=D&1162=G&1163=A&1164=B&1165=A&1166=E&1174=A&1182=A&1183=A&1185=C&1188=Z&1202=KG&1211=A&1223=B&1229=Z&1231=A&1233=Z&1237=L&1256=TW&1258=C&1259=BH&1260=D&1261=A&1272=C&1296=Z&1298=Z&1299=A&1302=A&1303=A&1308=C&1309=J&1313=A&1316=B&1317=Z&1318=A&1319=Z&1323=D&1327=A&1328=A&1329=Z&1330=Z&1331=A&1332=Z&1333=Z&1335=DS&1339=Z&1345=AE&1346=AG&1352=Z&1356=Z&1366=Z&1367=B&1368=B&1369=A&1370=W&1371=A&1373=Z&1379=GQ&1381=A&1387=CX&1388=Z&1389=EU&1391=E&1392=J&1397=Z&1399=Z&1401=Z&1405=A&1406=CN&1412=Z&1416=Z&1417=B&1419=Z&1421=Z&1424=Z&1428=Z&1431=B&1432=B&1436=A&1437=Z&1441=CB&1442=A&1443=Z&1449=A&1452=Z&1453=Z&1470=A&1473=Z&1474=A&1475=Z&1477=D&1478=B&1479=B&1481=A&1488=A&1490=D&1493=E&1494=Z&1502=Y&1503=BH&1504=C&1505=B&1506=Z&1511=Z&1512=A&1513=KU&1518=A&1524=Z&1541=A&1549=A&1554=A&1623=A&1629=Z&1637=B&1640=Z&1642=Z&1643=Z&1644=B&1661=A&1673=K&1681=A&1771=A&1772=Z&1787=Z&1788=Z&1790=Z&1793=Z&1837=A&1838=Z&1839=Z&1854=Z&1855=Z&1856=Z&1858=Z&1867=B&1872=B&1873=B&1879=Z&1881=B&1902=Z&1910=Z&1916=B&1917=A&1918=B&1920=A&1927=A&1928=Z&1963=Z&1964=Z&1968=A&1974=A&1975=Z&1976=Z&1993=Z&2007=A&2010=Z&2011=A&2012=Z&2015=Z&2022=Z&2023=Z&2024=Z&2032=Z&2037=Z&MPC-9002=AA&MPC-9006=AG&')
,(165,'1=A&2=R&4=B&33=G&34=Q&46=Y&49=A&75=AH&589=C&590=A&803=IG&812=RB&813=RB&814=RB&815=RB&841=V&843=V&844=V&845=V&851=S&853=B&854=KJ&855=B&856=KJ&857=B&858=KJ&859=B&860=KJ&861=B&862=RB&863=B&864=RB&865=B&866=RB&867=B&881=AC&883=GE&889=M&897=IS&922=C&965=CP&1323=C&');

--You can split this data easily by transforming it into a JSON-array
-- A value like 1=A&3=B&8=B&11=A&12=R will be ["1=A","3=B","8=B","11=A","12=R"]

SELECT ID
      ,A.[key] AS OrdPosition
      ,A.[value] AS Fragment
FROM @tblA 
CROSS APPLY OPENJSON('["' + REPLACE(YourString,'&','","') + '"]') A

--Your second table is two-folded. We can split this in one single query too

DECLARE @tblB TABLE(ID INT,YourString VARCHAR(MAX));
INSERT INTO @tblB VALUES
 (1,'1=A&1323=D!1=B&1323=D!1=A&1323=E!1=F&1323=D!1=B&1323=E!1=F&1323=E')
,(2,'1=B&142=D&1323=D&1470=B')
,(3,'1=A&1323=E&1470=C');

SELECT ID
      ,A.[key] AS OrdPositionA
      ,B.[key] AS OrdPositionB
      ,B.[value] AS Fragment
FROM @tblB
CROSS APPLY OPENJSON('["' + REPLACE(YourString,'!','","') + '"]') A
CROSS APPLY OPENJSON('["' + REPLACE(A.[value],'&','","') + '"]') B

It should easy to combine this approaches with a simple join.

Performance

OPENJSON performs pretty well. In this answer I compared STRING_SPLIT, OPENJSON and the well known delimited8k() function. Worth to check it out...

But this is what you really should do:

Use a query like the one above to store your data in fragments. Whenever you need this, you can use STRING_AGG() to get the long strings back, but the heavy work is the splitting. It is not very smart to do this again and again and again...

UPDATE: Even better:

Using this for @tlbA

SELECT ID
      ,A.[key] AS OrdPosition
      ,A.[value] AS Fragment
      ,x.value('/x[1]','varchar(10)') AS FragmentNumber
      ,x.value('/x[2]','varchar(10)') AS FragmentCharacter
FROM @tblA 
CROSS APPLY OPENJSON('["' + REPLACE(YourString,'&','","') + '"]') A
CROSS APPLY(SELECT CAST('<x>' + REPLACE(A.[value],'=','</x><x>') + '</x>' AS XML)) B(x)
ORDER BY ID,OrdPosition

And this for @tblB

SELECT ID
      ,A.[key] AS OrdPositionA
      ,B.[key] AS OrdPositionB
      ,B.[value] AS Fragment
      ,x.value('/x[1]','varchar(10)') AS FragmentNumber
      ,x.value('/x[2]','varchar(10)') AS FragmentCharacter
FROM @tblB
CROSS APPLY OPENJSON('["' + REPLACE(YourString,'!','","') + '"]') A
CROSS APPLY OPENJSON('["' + REPLACE(A.[value],'&','","') + '"]') B
CROSS APPLY(SELECT CAST('<x>' + REPLACE(B.[value],'=','</x><x>') + '</x>' AS XML)) C(x)
ORDER BY ID,OrdPositionA,OrdPositionB

This will return each value neatly separated. Fill this into physical tables, place indexes and enjoy the speed you get back.

Store the values in such tables and get your original huge strings back with a simple STRING_AGG() query...

UPDATE 2

Answering your comment, here's a simple approach to join these results:

DECLARE @tblA TABLE(ID INT,YourString VARCHAR(MAX));
INSERT INTO @tblA VALUES
 (123,'1=A&3=B&8=B&11=A&12=R&17=BX&19=Z&20=B&21=AK&22=AH&24=A&27=A&28=A&29=C&31=A&32=S&33=N&34=H&35=Z&37=A&38=Z&39=A&41=D&42=G&49=A&52=Z&53=C&54=D&55=Z&56=Z&57=Z&58=A&61=B&63=B&65=A&66=A&67=A&68=A&69=A&71=G&73=Q&74=AB&76=B&77=Z&79=F&80=B&82=D&83=B&89=BA&91=Z&92=Z&93=Z&95=R&97=B&98=Z&99=G&100=G&101=B&106=C&109=Z&110=A&111=Z&112=C&116=B&118=Z&120=B&121=A&122=A&124=E&125=Z&128=A&131=Z&137=F&141=A&142=B&143=Z&144=B&146=C&148=Z&153=A&157=A&164=A&167=Z&168=Z&170=C&188=E&200=A&203=B&227=Z&235=C&237=B&238=G&258=Z&259=Z&268=Z&272=B&274=D&284=B&285=C&288=A&289=Z&290=A&299=M&300=D&313=Z&315=Z&322=A&324=Z&327=Z&358=E&360=B&365=A&366=A&375=B&376=B&379=Z&381=Z&383=M&387=Z&390=Z&391=Z&392=B&393=Z&398=MA&400=Z&403=B&404=A&407=B&408=BT&410=B&412=A&413=F&414=B&415=F&416=B&419=Z&421=Z&430=C&433=Z&435=C&440=Z&441=PG&448=A&453=A&458=B&459=Z&460=Z&463=A&466=Z&468=Z&472=Z&474=A&475=Z&492=Z&496=Z&498=A&504=A&506=B&507=Z&514=D&518=B&519=Z&523=C&536=A&554=Z&557=Z&560=A&582=Z&584=A&586=Z&587=Z&588=B&591=C&599=A&603=Z&610=Z&614=B&625=A&664=A&666=C&668=C&670=Z&672=L&674=L&698=B&699=A&720=BB&745=Z&754=A&756=Z&759=Z&762=Z&766=AA&767=AA&769=A&772=HB&785=B&787=Z&788=Z&789=Z&790=Z&791=Z&792=Z&794=Z&795=A&796=Z&803=A&808=Z&809=DC&812=G&813=G&815=G&821=K&822=M&824=ES&828=Z&829=GA&831=SN&837=BC&841=K&845=K&851=S&852=Z&853=B&854=WU&855=B&856=WU&857=A&859=A&861=B&863=C&865=A&867=B&869=Z&873=ZA&882=CM&889=CD&921=Z&922=Z&924=Z&925=Z&932=B&933=Z&941=Z&952=RJ&954=Z&967=P&984=A&985=H&1002=D&1007=F&1041=F&1052=BR&1053=BR&1056=PD&1057=NA&1095=H&1104=B&1121=A&1122=M&1123=D&1162=G&1163=A&1164=B&1165=A&1166=E&1174=A&1182=A&1183=A&1185=C&1188=Z&1202=KG&1211=A&1223=B&1229=Z&1231=A&1233=Z&1237=L&1256=TW&1258=C&1259=BH&1260=D&1261=A&1272=C&1296=Z&1298=Z&1299=A&1302=A&1303=A&1308=C&1309=J&1313=A&1316=B&1317=Z&1318=A&1319=Z&1323=D&1327=A&1328=A&1329=Z&1330=Z&1331=A&1332=Z&1333=Z&1335=DS&1339=Z&1345=AE&1346=AG&1352=Z&1356=Z&1366=Z&1367=B&1368=B&1369=A&1370=W&1371=A&1373=Z&1379=GQ&1381=A&1387=CX&1388=Z&1389=EU&1391=E&1392=J&1397=Z&1399=Z&1401=Z&1405=A&1406=CN&1412=Z&1416=Z&1417=B&1419=Z&1421=Z&1424=Z&1428=Z&1431=B&1432=B&1436=A&1437=Z&1441=CB&1442=A&1443=Z&1449=A&1452=Z&1453=Z&1470=A&1473=Z&1474=A&1475=Z&1477=D&1478=B&1479=B&1481=A&1488=A&1490=D&1493=E&1494=Z&1502=Y&1503=BH&1504=C&1505=B&1506=Z&1511=Z&1512=A&1513=KU&1518=A&1524=Z&1541=A&1549=A&1554=A&1623=A&1629=Z&1637=B&1640=Z&1642=Z&1643=Z&1644=B&1661=A&1673=K&1681=A&1771=A&1772=Z&1787=Z&1788=Z&1790=Z&1793=Z&1837=A&1838=Z&1839=Z&1854=Z&1855=Z&1856=Z&1858=Z&1867=B&1872=B&1873=B&1879=Z&1881=B&1902=Z&1910=Z&1916=B&1917=A&1918=B&1920=A&1927=A&1928=Z&1963=Z&1964=Z&1968=A&1974=A&1975=Z&1976=Z&1993=Z&2007=A&2010=Z&2011=A&2012=Z&2015=Z&2022=Z&2023=Z&2024=Z&2032=Z&2037=Z&MPC-9002=AA&MPC-9006=AG&')
,(165,'1=A&2=R&4=B&33=G&34=Q&46=Y&49=A&75=AH&589=C&590=A&803=IG&812=RB&813=RB&814=RB&815=RB&841=V&843=V&844=V&845=V&851=S&853=B&854=KJ&855=B&856=KJ&857=B&858=KJ&859=B&860=KJ&861=B&862=RB&863=B&864=RB&865=B&866=RB&867=B&881=AC&883=GE&889=M&897=IS&922=C&965=CP&1323=C&');

DECLARE @tblB TABLE(ID INT,YourString VARCHAR(MAX));
INSERT INTO @tblB VALUES
 (1,'1=A&1323=D!1=B&1323=D!1=A&1323=E!1=F&1323=D!1=B&1323=E!1=F&1323=E')
,(2,'1=B&142=D&1323=D&1470=B')
,(3,'1=A&1323=E&1470=C');

WITH tblA AS
(
    SELECT ID
          ,A.[key] AS OrdPosition
          ,A.[value] AS Fragment
          ,x.value('/x[1]','varchar(10)') AS FragmentLeft
          ,x.value('/x[2]','varchar(10)') AS FragmentRight
    FROM @tblA 
    CROSS APPLY OPENJSON('["' + REPLACE(YourString,'&','","') + '"]') A
    CROSS APPLY(SELECT CAST('<x>' + REPLACE(A.[value],'=','</x><x>') + '</x>' AS XML)) B(x)
)
,tblB AS
(
    SELECT ID
          ,A.[key] AS OrdPositionA
          ,B.[key] AS OrdPositionB
          ,B.[value] AS Fragment
          ,x.value('/x[1]','varchar(10)') AS FragmentLeft
          ,x.value('/x[2]','varchar(10)') AS FragmentRight
    FROM @tblB
    CROSS APPLY OPENJSON('["' + REPLACE(YourString,'!','","') + '"]') A
    CROSS APPLY OPENJSON('["' + REPLACE(A.[value],'&','","') + '"]') B
    CROSS APPLY(SELECT CAST('<x>' + REPLACE(B.[value],'=','</x><x>') + '</x>' AS XML)) C(x)
)
SELECT a.Fragment
      ,a.ID AS ID_a, a.OrdPosition AS OrdPosition_a
      ,b.ID AS ID_b, b.OrdPositionA AS OrdPosition_b1,b.OrdPositionB AS OrdPosition_b2
FROM tblA a
INNER JOIN tblB b ON a.FragmentLeft=b.FragmentLeft AND a.FragmentRight=b.FragmentRight;

The result

+----------+------+---------------+------+----------------+----------------+
| Fragment | ID_a | OrdPosition_a | ID_b | OrdPosition_b1 | OrdPosition_b2 |
+----------+------+---------------+------+----------------+----------------+
| 1=A      | 123  | 0             | 1    | 0              | 0              |
+----------+------+---------------+------+----------------+----------------+
| 1=A      | 165  | 0             | 1    | 0              | 0              |
+----------+------+---------------+------+----------------+----------------+
| 1=A      | 123  | 0             | 1    | 2              | 0              |
+----------+------+---------------+------+----------------+----------------+
| 1=A      | 165  | 0             | 1    | 2              | 0              |
+----------+------+---------------+------+----------------+----------------+
| 1=A      | 123  | 0             | 3    | 0              | 0              |
+----------+------+---------------+------+----------------+----------------+
| 1=A      | 165  | 0             | 3    | 0              | 0              |
+----------+------+---------------+------+----------------+----------------+
| 1323=D   | 123  | 296           | 1    | 3              | 1              |
+----------+------+---------------+------+----------------+----------------+
| 1323=D   | 123  | 296           | 1    | 1              | 1              |
+----------+------+---------------+------+----------------+----------------+
| 1323=D   | 123  | 296           | 1    | 0              | 1              |
+----------+------+---------------+------+----------------+----------------+
| 1323=D   | 123  | 296           | 2    | 0              | 2              |
+----------+------+---------------+------+----------------+----------------+

Read it like this:

  • The Fragment 1=A is common in the a-lines 123 and 165 a with the b-lines 1 or 3 in the first or the third block (zero-based index) in the first position

  • The Fragment 1323=D is common in the a-line 123 with b-line 1 and 2. It is the 297th fragment in the a-line and the ... (and so on...)

If this does not help you: Please provide the expected output together with the rules for your join.

Community
  • 1
  • 1
Shnugo
  • 66,100
  • 9
  • 53
  • 114
2

In MS Sql Server 2016+ you could use the STRING_SPLIT function to split the strings from tableB.

Even double split em.

Then match those string parts against the strings from TableA.

Example snippet using table variables for demonstration:

declare @TableA table (ID int primary key identity(1,1), [Str] varchar(3000));
declare @TableB table (ID int primary key identity(1,1), [Str] varchar(100));

insert into @TableA ([Str]) values
 ('1=A&3=B&8=B&11=A&12=R&17=BX&19=Z&20=B&21=AK&22=AH&24=A&27=A&28=A&29=C&31=A&32=S&33=N&34=H&35=Z&37=A&38=Z&39=A&41=D&42=G&49=A&52=Z&53=C&54=D&55=Z&56=Z&57=Z&58=A&61=B&63=B&65=A&66=A&67=A&68=A&69=A&71=G&73=Q&74=AB&76=B&77=Z&79=F&80=B&82=D&83=B&89=BA&91=Z&92=Z&93=Z&95=R&97=B&98=Z&99=G&100=G&101=B&106=C&109=Z&110=A&111=Z&112=C&116=B&118=Z&120=B&121=A&122=A&124=E&125=Z&128=A&131=Z&137=F&141=A&142=B&143=Z&144=B&146=C&148=Z&153=A&157=A&164=A&167=Z&168=Z&170=C&188=E&200=A&203=B&227=Z&235=C&237=B&238=G&258=Z&259=Z&268=Z&272=B&274=D&284=B&285=C&288=A&289=Z&290=A&299=M&300=D&313=Z&315=Z&322=A&324=Z&327=Z&358=E&360=B&365=A&366=A&375=B&376=B&379=Z&381=Z&383=M&387=Z&390=Z&391=Z&392=B&393=Z&398=MA&400=Z&403=B&404=A&407=B&408=BT&410=B&412=A&413=F&414=B&415=F&416=B&419=Z&421=Z&430=C&433=Z&435=C&440=Z&441=PG&448=A&453=A&458=B&459=Z&460=Z&463=A&466=Z&468=Z&472=Z&474=A&475=Z&492=Z&496=Z&498=A&504=A&506=B&507=Z&514=D&518=B&519=Z&523=C&536=A&554=Z&557=Z&560=A&582=Z&584=A&586=Z&587=Z&588=B&591=C&599=A&603=Z&610=Z&614=B&625=A&664=A&666=C&668=C&670=Z&672=L&674=L&698=B&699=A&720=BB&745=Z&754=A&756=Z&759=Z&762=Z&766=AA&767=AA&769=A&772=HB&785=B&787=Z&788=Z&789=Z&790=Z&791=Z&792=Z&794=Z&795=A&796=Z&803=A&808=Z&809=DC&812=G&813=G&815=G&821=K&822=M&824=ES&828=Z&829=GA&831=SN&837=BC&841=K&845=K&851=S&852=Z&853=B&854=WU&855=B&856=WU&857=A&859=A&861=B&863=C&865=A&867=B&869=Z&873=ZA&882=CM&889=CD&921=Z&922=Z&924=Z&925=Z&932=B&933=Z&941=Z&952=RJ&954=Z&967=P&984=A&985=H&1002=D&1007=F&1041=F&1052=BR&1053=BR&1056=PD&1057=NA&1095=H&1104=B&1121=A&1122=M&1123=D&1162=G&1163=A&1164=B&1165=A&1166=E&1174=A&1182=A&1183=A&1185=C&1188=Z&1202=KG&1211=A&1223=B&1229=Z&1231=A&1233=Z&1237=L&1256=TW&1258=C&1259=BH&1260=D&1261=A&1272=C&1296=Z&1298=Z&1299=A&1302=A&1303=A&1308=C&1309=J&1313=A&1316=B&1317=Z&1318=A&1319=Z&1323=D&1327=A&1328=A&1329=Z&1330=Z&1331=A&1332=Z&1333=Z&1335=DS&1339=Z&1345=AE&1346=AG&1352=Z&1356=Z&1366=Z&1367=B&1368=B&1369=A&1370=W&1371=A&1373=Z&1379=GQ&1381=A&1387=CX&1388=Z&1389=EU&1391=E&1392=J&1397=Z&1399=Z&1401=Z&1405=A&1406=CN&1412=Z&1416=Z&1417=B&1419=Z&1421=Z&1424=Z&1428=Z&1431=B&1432=B&1436=A&1437=Z&1441=CB&1442=A&1443=Z&1449=A&1452=Z&1453=Z&1470=A&1473=Z&1474=A&1475=Z&1477=D&1478=B&1479=B&1481=A&1488=A&1490=D&1493=E&1494=Z&1502=Y&1503=BH&1504=C&1505=B&1506=Z&1511=Z&1512=A&1513=KU&1518=A&1524=Z&1541=A&1549=A&1554=A&1623=A&1629=Z&1637=B&1640=Z&1642=Z&1643=Z&1644=B&1661=A&1673=K&1681=A&1771=A&1772=Z&1787=Z&1788=Z&1790=Z&1793=Z&1837=A&1838=Z&1839=Z&1854=Z&1855=Z&1856=Z&1858=Z&1867=B&1872=B&1873=B&1879=Z&1881=B&1902=Z&1910=Z&1916=B&1917=A&1918=B&1920=A&1927=A&1928=Z&1963=Z&1964=Z&1968=A&1974=A&1975=Z&1976=Z&1993=Z&2007=A&2010=Z&2011=A&2012=Z&2015=Z&2022=Z&2023=Z&2024=Z&2032=Z&2037=Z&MPC-9002=AA&MPC-9006=AG&')
,('1=A&2=R&4=B&33=G&34=Q&46=Y&49=A&75=AH&589=C&590=A&803=IG&812=RB&813=RB&814=RB&815=RB&841=V&843=V&844=V&845=V&851=S&853=B&854=KJ&855=B&856=KJ&857=B&858=KJ&859=B&860=KJ&861=B&862=RB&863=B&864=RB&865=B&866=RB&867=B&881=AC&883=GE&889=M&897=IS&922=C&965=CP&1323=C&')
;

insert into @TableB ([Str]) values
 ('1=A&1323=D!1=B&1323=D!1=A&1323=E!1=F&1323=D!1=B&1323=E!1=F&1323=E')
,('1=B&142=D&1323=D&1470=B')
,('1=A&1323=E&1470=C')
;

select A.ID as ID1, ssB.ID as ID2
-- , COUNT(DISTINCT ssB.StrPart) as TotalStringPartsMatching
-- , STRING_AGG(ssB.StrPart, '&') WITHIN GROUP (ORDER BY ssB.StrPart) as CommonStrParts
from @TableA A
left join 
(
   select distinct B.ID, ss2.value as StrPart
   from @TableB B
   cross apply string_split([Str],'!') ss1
   cross apply string_split(ss1.value,'&') ss2
) ssB on concat('&',A.[Str],'&') like concat('%&',ssb.StrPart,'&%')
group by A.ID, ssB.ID;

Result:

ID1 ID2
1   1
1   2
1   3
2   1
2   3
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • Great! What you think about CTE? Is it worth to post solution with recursive queries? – Maciej Los Feb 08 '19 at 14:04
  • @MaciejLos I think it's possible to write a solution that use a recursive CTE. But I have some doubts that it would be more performant than by using JOIN's. Since it's a lot of data to match. Actually, to speed this up I assume it could be usefull to use a temporary table for what the ssB subquery calculates. – LukStorms Feb 08 '19 at 14:08
  • `STRING_SPLIT()` regrettfully does not return the fragment's position. In your call to `STRING_AGG()` you are sorting by `ssB.StrPart`, but this might not return the original order (Although the given data looks like this). you might have a look into my answer, how `OPENJSON` can overcome this flaw... – Shnugo Feb 08 '19 at 14:41
  • @Shnugo I agree about STRING_SPLIT. Only returning values without their position seems like a missed opportunity to provide an even more usefull function. Btw, using OPENJSON as alternative is kinda cool. – LukStorms Feb 08 '19 at 15:11
1

On account of very interesting solution provided by LukStorms and our discussion in comments to His solution, i'm not sure it's the best alternative, but... CTE should do the job.

Note: i do not have MS SQL Server 2017, so i've used CHARINDEX function to split string into parts and to compare them.

Here it's:

DECLARE @Equip TABLE(ID INT, StrData VARCHAR(8000))
INSERT INTO @Equip (ID, StrData )
VALUES(123, '1=A&3=B&8=B&11=A&12=R&17=BX&19=Z&20=B&21=AK&22=AH&24=A&27=A&28=A&29=C&31=A&32=S&33=N&34=H&35=Z&37=A&38=Z&39=A&41=D&42=G&49=A&52=Z&53=C&54=D&55=Z&56=Z&57=Z&58=A&61=B&63=B&65=A&66=A&67=A&68=A&69=A&71=G&73=Q&74=AB&76=B&77=Z&79=F&80=B&82=D&83=B&89=BA&91=Z&92=Z&93=Z&95=R&97=B&98=Z&99=G&100=G&101=B&106=C&109=Z&110=A&111=Z&112=C&116=B&118=Z&120=B&121=A&122=A&124=E&125=Z&128=A&131=Z&137=F&141=A&142=B&143=Z&144=B&146=C&148=Z&153=A&157=A&164=A&167=Z&168=Z&170=C&188=E&200=A&203=B&227=Z&235=C&237=B&238=G&258=Z&259=Z&268=Z&272=B&274=D&284=B&285=C&288=A&289=Z&290=A&299=M&300=D&313=Z&315=Z&322=A&324=Z&327=Z&358=E&360=B&365=A&366=A&375=B&376=B&379=Z&381=Z&383=M&387=Z&390=Z&391=Z&392=B&393=Z&398=MA&400=Z&403=B&404=A&407=B&408=BT&410=B&412=A&413=F&414=B&415=F&416=B&419=Z&421=Z&430=C&433=Z&435=C&440=Z&441=PG&448=A&453=A&458=B&459=Z&460=Z&463=A&466=Z&468=Z&472=Z&474=A&475=Z&492=Z&496=Z&498=A&504=A&506=B&507=Z&514=D&518=B&519=Z&523=C&536=A&554=Z&557=Z&560=A&582=Z&584=A&586=Z&587=Z&588=B&591=C&599=A&603=Z&610=Z&614=B&625=A&664=A&666=C&668=C&670=Z&672=L&674=L&698=B&699=A&720=BB&745=Z&754=A&756=Z&759=Z&762=Z&766=AA&767=AA&769=A&772=HB&785=B&787=Z&788=Z&789=Z&790=Z&791=Z&792=Z&794=Z&795=A&796=Z&803=A&808=Z&809=DC&812=G&813=G&815=G&821=K&822=M&824=ES&828=Z&829=GA&831=SN&837=BC&841=K&845=K&851=S&852=Z&853=B&854=WU&855=B&856=WU&857=A&859=A&861=B&863=C&865=A&867=B&869=Z&873=ZA&882=CM&889=CD&921=Z&922=Z&924=Z&925=Z&932=B&933=Z&941=Z&952=RJ&954=Z&967=P&984=A&985=H&1002=D&1007=F&1041=F&1052=BR&1053=BR&1056=PD&1057=NA&1095=H&1104=B&1121=A&1122=M&1123=D&1162=G&1163=A&1164=B&1165=A&1166=E&1174=A&1182=A&1183=A&1185=C&1188=Z&1202=KG&1211=A&1223=B&1229=Z&1231=A&1233=Z&1237=L&1256=TW&1258=C&1259=BH&1260=D&1261=A&1272=C&1296=Z&1298=Z&1299=A&1302=A&1303=A&1308=C&1309=J&1313=A&1316=B&1317=Z&1318=A&1319=Z&1323=D&1327=A&1328=A&1329=Z&1330=Z&1331=A&1332=Z&1333=Z&1335=DS&1339=Z&1345=AE&1346=AG&1352=Z&1356=Z&1366=Z&1367=B&1368=B&1369=A&1370=W&1371=A&1373=Z&1379=GQ&1381=A&1387=CX&1388=Z&1389=EU&1391=E&1392=J&1397=Z&1399=Z&1401=Z&1405=A&1406=CN&1412=Z&1416=Z&1417=B&1419=Z&1421=Z&1424=Z&1428=Z&1431=B&1432=B&1436=A&1437=Z&1441=CB&1442=A&1443=Z&1449=A&1452=Z&1453=Z&1470=A&1473=Z&1474=A&1475=Z&1477=D&1478=B&1479=B&1481=A&1488=A&1490=D&1493=E&1494=Z&1502=Y&1503=BH&1504=C&1505=B&1506=Z&1511=Z&1512=A&1513=KU&1518=A&1524=Z&1541=A&1549=A&1554=A&1623=A&1629=Z&1637=B&1640=Z&1642=Z&1643=Z&1644=B&1661=A&1673=K&1681=A&1771=A&1772=Z&1787=Z&1788=Z&1790=Z&1793=Z&1837=A&1838=Z&1839=Z&1854=Z&1855=Z&1856=Z&1858=Z&1867=B&1872=B&1873=B&1879=Z&1881=B&1902=Z&1910=Z&1916=B&1917=A&1918=B&1920=A&1927=A&1928=Z&1963=Z&1964=Z&1968=A&1974=A&1975=Z&1976=Z&1993=Z&2007=A&2010=Z&2011=A&2012=Z&2015=Z&2022=Z&2023=Z&2024=Z&2032=Z&2037=Z&MPC-9002=AA&MPC-9006=AG&'),
(165, '1=A&2=R&4=B&33=G&34=Q&46=Y&49=A&75=AH&589=C&590=A&803=IG&812=RB&813=RB&814=RB&815=RB&841=V&843=V&844=V&845=V&851=S&853=B&854=KJ&855=B&856=KJ&857=B&858=KJ&859=B&860=KJ&861=B&862=RB&863=B&864=RB&865=B&866=RB&867=B&881=AC&883=GE&889=M&897=IS&922=C&965=CP&1323=C&')

DECLARE @Another TABLE(ID INT, StrData VARCHAR(8000))
INSERT INTO @Another (ID, StrData )
VALUES(1, '1=A&1323=D!1=B&1323=D!1=A&1323=E!1=F&1323=D!1=B&1323=E!1=F&1323=E'),
(2, '1=B&142=D&1323=D&1470=B'),
(3, '1=A&1323=E&1470=C')


;WITH E AS
(
    --inital part
    SELECT 1 AS Step, ID, LEFT(StrData, CHARINDEX('&', StrData) -1) AS Part, RIGHT(StrData, LEN(StrData) - CHARINDEX('&', StrData)) AS Remainder
    FROM @Equip 
    WHERE CHARINDEX('&', StrData)>0
    -- recursive part
    UNION ALL
    SELECT Step+1 AS Step, ID, LEFT(Remainder, CHARINDEX('&', Remainder) -1) AS Part, RIGHT(Remainder, LEN(Remainder) - CHARINDEX('&', Remainder)) AS Remainder
    FROM E 
    WHERE CHARINDEX('&', Remainder)>0
    UNION ALL
    SELECT Step+1 AS Step, ID, Remainder AS Part, NULL AS Remainder
    FROM E 
    WHERE CHARINDEX('&', Remainder)=0
), 
    A AS
    (
        --inital part
        SELECT 1 AS Step, ID, LEFT(StrData, CHARINDEX('!', StrData) -1) AS Part, RIGHT(StrData, LEN(StrData) - CHARINDEX('!', StrData)) AS Remainder
        FROM @Another 
        WHERE CHARINDEX('!', StrData)>0
        -- recursive part
        UNION ALL
        SELECT Step+1 AS Step, ID, LEFT(Remainder, CHARINDEX('!', Remainder) -1) AS Part, RIGHT(Remainder, LEN(Remainder) - CHARINDEX('!', Remainder)) AS Remainder
        FROM A 
        WHERE CHARINDEX('!', Remainder)>0
        UNION ALL
        SELECT Step+1 AS Step, ID, Remainder AS Part, NULL AS Remainder
        FROM A 
        WHERE CHARINDEX('!', Remainder)=0
    )
SELECT E.ID AS EID, E.Step AS EStep, E.Part AS EPart, A.ID AS AID, A.Part AS APart, CHARINDEX(E.Part, A.Part) AS Result
FROM E CROSS JOIN A 
WHERE CHARINDEX(E.Part, A.Part)>0
OPTION (MAXRECURSION 0)

Result:

EID EStep   EPart   AID APart       Result
123 1       1=A     1   1=A&1323=D  1
123 1       1=A     1   1=A&1323=E  1
165 1       1=A     1   1=A&1323=D  1
165 1       1=A     1   1=A&1323=E  1
123 297     1323=D  1   1=A&1323=D  5
123 297     1323=D  1   1=B&1323=D  5
123 297     1323=D  1   1=F&1323=D  5

Good luck!

Maciej Los
  • 8,468
  • 1
  • 20
  • 35
  • 1
    I do wonder how it will perform against millions of rows, but I like your effort in trying it the recursive way. – LukStorms Feb 08 '19 at 15:23
  • @LukStorms, Thank you. As i mentioned in the answer, i'm not sure if it will improve performance, but it's an alternative. Cheers! Maciej – Maciej Los Feb 08 '19 at 16:04