-3

I have text column with numeric values separated by semicolons. I'm trying to figure out how to get the most frequent pair of values that appeared together in the same row. I've found a solution for a very similar problem in Python Finding the most frequent occurrences of pairs in a list of lists, but I don't know how to rewrite it in using SQL In example below it returns 2 and 3 because this pair appeared 3 times in the input set:

Input rows      Output
----------      -------
';1;2;3;5'    |  '2;3'     
';2;3'        |  '1;2'
';3;4;5;1;2'  |  '1;3' 
';1;5;2'      |  '1;5'

Orginal data:

query result

Adam Stawarek
  • 321
  • 4
  • 9
  • 4
    Please post your data in text format, not image. Also, you need to explain how 940 and 10662 is the expected outcome. – Sateesh Pagolu Feb 20 '19 at 23:38
  • Read [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad), where you will see a lot of reasons why the answer to this question is **Absolutely yes!** You have a **broken** data model, this is why it's so hard to work with. If you would normalize your database, this task would be trivial. – Zohar Peled Feb 21 '19 at 07:15
  • What is your SQL Server version? – Zhorov Feb 21 '19 at 15:02
  • 2017 for Developers – Adam Stawarek Feb 21 '19 at 15:07
  • @AdamStawarek Why `1;2`, `1;5` and `2;5` are not in the output? And are `2;3` and `3;2` the same? – Zhorov Feb 21 '19 at 15:46
  • sorry i forgot about them, i will fix it now – Adam Stawarek Feb 21 '19 at 15:50
  • Yes order doesn't matter so 2;3 and 3;2 count as the same – Adam Stawarek Feb 21 '19 at 15:53

2 Answers2

1

You may try with the following approach. First, using OPENJSON(), get all possible combinations. When OPENJSON parses a JSON array the indexes of the elements in the JSON text are returned as keys (0-based). Then, count the most frequent pair with DENSE_RANK().

Input:

CREATE TABLE #Items (
   Id int,
   ItemValues varchar(max)
)
INSERT INTO #Items
   (Id, ItemValues)
VALUES   
   (1, '1;2;3;5'),
   (2, '2;3'),
   (3, '3;4;5;1;2'),
   (4, '1;5;2')

Statement:

;WITH combinationsCTE AS (
   SELECT 
      CASE 
         WHEN s1.[value] <= s2.[value] THEN CONCAT(s1.[value], ';', s2.[value])
         ELSE CONCAT(s2.[value], ';', s1.[value]) 
      END AS PairValue
   FROM #Items i
   CROSS APPLY (SELECT [key], [value] FROM OPENJSON('["' +  REPLACE(i.ItemValues,';','","') + '"]')) s1
   CROSS APPLY (SELECT [key], [value] FROM OPENJSON('["' +  REPLACE(i.ItemValues,';','","') + '"]')) s2
   WHERE (s1.[key] < s2.[key])
), rankingCTE AS (
   SELECT 
      PairValue, 
      DENSE_RANK() OVER (ORDER BY COUNT(PairValue) DESC) AS PairRank
   FROM combinationsCTE
   GROUP BY PairValue
)
SELECT PairValue
FROM rankingCTE
WHERE PairRank = 1

Output:

PairValue
1;2
1;5
2;3
2;5
Zhorov
  • 28,486
  • 6
  • 27
  • 52
0

First have a split function

CREATE FUNCTION Splitfn(@String varchar(8000), @Delimiter char(1))       
returns @temptable TABLE (items varchar(8000))       
as       
begin       
    declare @idx int       
    declare @slice varchar(8000)       

    select @idx = 1       
        if len(@String)<1 or @String is null  return       

    while @idx!= 0       
    begin       
        set @idx = charindex(@Delimiter,@String)       
        if @idx!=0       
            set @slice = left(@String,@idx - 1)       
        else       
            set @slice = @String       

        if(len(@slice)>0)  
            insert into @temptable(Items) values(@slice)       

        set @String = right(@String,len(@String) - @idx)       
        if len(@String) = 0 break       
    end   
return      

end

Second Step To get all rows in a single string

Declare @val Varchar(MAX); 
Select @val = COALESCE(@val + '; ' + YourColumn, YourColumn) 
        From YourTable

Third step,

SELECT TOP 1 items, count(*)
FROM dbo.Splitfn(@Val, ';')
WHERE LTRIM(RTRIM(items)) <> ''
GROUP BY items
ORDER BY Count(*) DESC
asmgx
  • 7,328
  • 15
  • 82
  • 143
  • 1
    This is probably the worst possible way to split a string in SQL. The best, if working with 2016 version or higher, is using the built in string_split. For lower versions, read Aaron Bertrand's [Split strings the right way – or the next best way](https://sqlperformance.com/2012/07/t-sql-queries/split-strings) – Zohar Peled Feb 21 '19 at 07:18
  • This will return the most frequent value but my intention is to get the most frequent pair of values, so for each set of values in each row i need to find all combinations of values and return the one that was most common in the whole set – Adam Stawarek Feb 21 '19 at 15:06