1

I have a question related to matching strings in a MSSQL database. Basically, I have a table that contains ICD9 and CPT codes. The issue is that the format that these codes come in is usually incorrect (i.e. too many characters, missing decimal, etc...). I need to be able to lookup the description for each of these codes from a lookup table containing the correct code.

Because of the way these codes are structured I can do some type of "progressive" match to at least find the category of the code.

Lets say the correct code is something like: 306.98

And for this example lets pretend there are no other values between 306 and 307.

I would like to strip the decimal and look for a match, one character at a time, until one is not found. Then select the last matching string.

So 306,3069,3098, 306981, 3069812, etc... would match the string 306.98.

I hope that makes sense to everyone. I am not sure how I would even begin to do this, so any suggestion would be a great help.

Matt
  • 183
  • 12

4 Answers4

1

One possible solution is to strip down the code to its basic element (306) and then do a like operator:

WHERE Code LIKE '306%'
Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • 1
    I considered that, but in there could be a 306.1, 306.2, 306.3... I would want a string of 30611 to match 306.1 and 30621 to match 306.2. Does that make sense? – Matt Apr 20 '11 at 12:54
  • Well, strip it to `306` and then use `LIKE`, as I said. – Aliostad Apr 20 '11 at 12:57
  • Ok, would it be possible to do it in reverse? Say start with 3069812 or 30690 and have it match on something like 306.69. I assume it would have to search 1 character at a time until a match is no longer found, then select the last matching value. – Matt Apr 20 '11 at 13:07
  • If a 304 comes in then it could be 304 or 30.4, you could do a `SUBSTRING` compare but that's still not going to give you accuracy – DKnight Apr 20 '11 at 19:39
1

Use FLOOR function to strip the decimal part and then use a LIKE operator in the WHERE clause.

Something like:

SELECT <COLUMN-LIST>
  FROM <TABLE-NAME>
 WHERE <THE-COLUMN> LIKE CAST(FLOOR(306.09) AS VARCHAR) + '%'
Chandu
  • 81,493
  • 19
  • 133
  • 134
  • If he has additional characters and missing decimal places I don't think `LIKE` will be accurate enough – DKnight Apr 20 '11 at 19:35
  • @DKnight: Yes, you are right, but neither of us are so sure about what kind of data is OP dealing with. Hence any possible scenario would be an assumption. – Chandu Apr 20 '11 at 19:37
  • Unfortunetly, I'm familiar with ICD9 codes and his description states that missing characters and decimal places are his main problem. I think the only real solution is to push for better data if possible. Mismatching on medical and billing codes can be very dangerous, better to not make any guesses at all. – DKnight Apr 20 '11 at 19:46
  • @DKnight: Let me look at how ICD9 codes appear and then rephrase the answer as needed. Thx – Chandu Apr 20 '11 at 19:50
0

Here you have your example.You just need to convert value to nvarchar @string.

DECLARE @string AS NVARCHAR (MAX) = '306.98';
DECLARE @Table TABLE (
    TextVal NVARCHAR (MAX));

INSERT INTO @Table ([TextVal])
SELECT '4444656'
UNION ALL
SELECT '30'
UNION ALL
SELECT '3069'
UNION ALL
SELECT '306989878787'
;

WITH   numbers
AS     (SELECT ROW_NUMBER() OVER ( ORDER BY (SELECT 1)) AS Number
        FROM   [sys].[objects] AS o1 CROSS JOIN [sys].[objects] AS o2),
       Chars
AS     (SELECT SUBSTRING(@string, [Number], 1) AS Let,
               [Number]
        FROM   [numbers]
        WHERE  [Number] <= LEN(@string)),
       Joined
AS     (SELECT [Let],
               CAST (1 AS BIGINT) AS Number
        FROM   chars
        WHERE  [Number] = 1
        UNION ALL
        SELECT [J].[Let] + CASE 
                           WHEN [Chars].[Let] = '.' THEN '' ELSE [Chars].[Let] 
                           END AS LEt,
               Chars.[Number]
        FROM   [Joined] AS J
               INNER JOIN
               [Chars]
               ON [Chars].[Number] = [J].[Number] + 1)
SELECT *
FROM   @Table AS T
WHERE  [T].[TextVal] IN (SELECT [Let]
                         FROM   [Joined])
          OR [T].[TextVal] LIKE '%'+(SELECT TOP 1 [Let] FROM
          [Joined] ORDER BY [Number] DESC )  +'%'            
                         ;

Result will be:

 TextVal
30
3069
306989878787
Dalex
  • 3,585
  • 19
  • 25
0

I was able to figure it out. Basically, I just needed to step through each character of the string and look for a match until once was no longer found. Thanks for the help!

/* ICD9 Lookup */

USE TSiData_Suite_LWHS_V11

DECLARE @String NVARCHAR (10)
DECLARE @Match NVARCHAR(10)
DECLARE @Substring NVARCHAR (10)
DECLARE @Description NVARCHAR(MAX) 
DECLARE @Length INT
DECLARE @Count INT

SET @String = '309.99999999'

/* Remove decimal place from string */
SET @String = REPLACE(@String,'.','')

/* Get lenth of string */
SET @Length = LEN(@String)

/* Initialize count */
SET @Count = 1

/* Get Substring */
SET @Substring = SUBSTRING(@String,1,@Count)

/* Start processing */
IF (@Length < 1 OR @String IS NULL)
    /* Validate @String */
    BEGIN

        SET @Description = 'No match found for string. String is not proper length.'

    END
ELSE IF ((SELECT COUNT(*) FROM LookupDiseases WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') < 1)
    /* Check for at least one match */
    BEGIN

        SET @Description = 'No match found for string.'

    END
ELSE
    /* Look for matching code */
    BEGIN

        WHILE ((SELECT COUNT(*) FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') <> 1 AND (@Count < @Length + 1))
        BEGIN

            /* Update substring value */
            SET @Substring = SUBSTRING(@String,1,@Count + 1)

            /* Increment @Count */
            SET @Count += 1

            /* Select the first matching code and get description */
            SELECT TOP(1) @Match =  LookupCodeDesc, @Description = LookupName FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%' ORDER BY LookupCodeDesc ASC

        END
    END

PRINT @Match
PRINT @Description
Matt
  • 183
  • 12