What is the best way to rank a sql varchar column by the number (count)/match of words in a parameter, with four distinct unique criteria. This is likely not a trivial question but I am challenged to order the rows based on "best match" using my criteria.
column: description varchar(100) Parameter: @MyParameter varchar(100)
Output with this order preference:
- Exact match (entire string matches) - always first
- Begins with (decending based on the parameter length of match)
- Count of words rank with contiguous words ranking higher for the same count of words matched
- Word(s) match anywhere (not contiguous)
Words might NOT match exactly, as partial matches of a word are allowed and likely, lessor value should be applied to partial words for ranking but not critical (pot would match each in: pot, potter, potholder, depot, depotting for instance). Begins with with other word matches should rank higher than those with no subsequent matches but that is not a deal killer/super important.
I would like to have a method to rank where the column "begins with" the value in the parameter. Say I have the following string:
'This is my value string as a test template to rank on.'
I would like to have, in the first case a rank of the column/row where the greatest count of words exist.
And the second to rank based on occurance (best match) at the begining as:
'This is my string as a test template to rank on.' - first
'This is my string as a test template to rank on even though not exact.'-second
'This is my string as a test template to rank' - third
'This is my string as a test template to' - next
'This is my string as a test template' - next etc.
Secondly:(possibly second set/group of data after the first (begins with) - this is desired
I want to rank (sort) the rows by the count of words in the @MyParameter that occur in @MyParameter with a rank where contiguous words rank higher than the same count separate.
Thus for the example string above, 'is my string as shown'
would rank higher than 'is not my other string as'
due the the "better match" of the contiguous string (words together) with the same count of words. Rows with a higher match (count of words that occur) would rank descending best match first.
If possible, I would like to do this in a single query.
No row should occur twice in the result.
For performance considerations, there will occur no more than 10,000 rows in the table.
The values in the table are fairly static with little change but not totally so.
I cannot change the structure at this time but would consider that later (like a word/phrase table)
To make this slightly more complicated, the word list is in two tables - but I could create a view for that, but one table results (smaller list) should occur prior to a second, larger data set results given the same match - there will be duplicates from these tables as well as within a table and I only want distinct values. Select DISTINCT is not easy as I want to return one column (sourceTable) which could quite possibly make the rows distinct and in that case only select from the first (smaller) table, but all the other columns DISTINCT is desired (do not consider that column in the "distinct" evaluation.
Psuedo Columns in table:
procedureCode VARCHAR(50),
description VARCHAR(100), -- this is the sort/evaluation column
category VARCHAR(50),
relvu VARCHAR(50),
charge VARCHAR(15),
active bit
sourceTable VARCHAR(50) - just shows which table it comes from of the two
NO unique index exists like an ID column
Matches NOT in a third table to be excluded SELECT * FROM (select * from tableone where procedureCode not in (select procedureCode from tablethree))
UNION ALL
(select * from tabletwo where procedureCode not in (select procedureCode from tablethree))
EDIT: in an attempt to address this I have created a table value paramter like so:
0 Gastric Intubation & Aspiration/Lavage, Treatmen
1 Gastric%Intubation%Aspiration%Lavage%Treatmen
2 Gastric%Intubation%Aspiration%Lavage
3 Gastric%Intubation%Aspiration
4 Gastric%Intubation
5 Gastric
6 Intubation%Aspiration%Lavage%Treatmen
7 Intubation%Aspiration%Lavage
8 Intubation%Aspiration
9 Intubation
10 Aspiration%Lavage%Treatmen
11 Aspiration%Lavage
12 Aspiration
13 Lavage%Treatmen
14 Lavage
15 Treatmen
where the actual phrase is in row 0
Here is my current attempt at this:
CREATE PROCEDURE [GetProcedureByDescription]
(
@IncludeMaster BIT,
@ProcedureSearchPhrases CPTFavorite READONLY
)
AS
DECLARE @myIncludeMaster BIT;
SET @myIncludeMaster = @IncludeMaster;
CREATE TABLE #DistinctMatchingCpts
(
procedureCode VARCHAR(50),
description VARCHAR(100),
category VARCHAR(50),
rvu VARCHAR(50),
charge VARCHAR(15),
active VARCHAR(15),
sourceTable VARCHAR(50),
sequenceSet VARCHAR(2)
)
IF @myIncludeMaster = 0
BEGIN -- Excluding master from search
INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet
)
SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet
FROM (
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM([CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
) AS CPTS
ORDER BY
procedureCode, sourceTable, [description]
END -- Excluded master from search
ELSE
BEGIN -- Including master in search, but present favorites before master for each code
-- Get matching procedures, ordered by code, source (favorites first), and description.
-- There probably will be procedures with duplicated code+description, so we will filter
-- duplicates shortly.
INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet)
SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet
FROM (
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''00'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''00'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''04'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''04'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
) AS CPTS
ORDER BY
sequenceSet, sourceTable, [description]
END
/* Final select - uses artificial ordering from the insertion ORDER BY */
SELECT procedureCode, description, category, rvu, charge, active FROM
(
SELECT TOP 500 *-- procedureCode, description, category, rvu, charge, active
FROM #DistinctMatchingCpts
ORDER BY sequenceSet, sourceTable, description
) AS CPTROWS
DROP TABLE #DistinctMatchingCpts
However, this does NOT meet the criteria of best match on the count of words (as in the row 1 value in the sample) which should match the best (most) words found count from that row.
I have complete control over the form/format of the table value parameter if that makes a difference.
I am returning this result to a c# program if that is useful.