1

I am trying to create a piece of code in sql server 2008 that will grab specific values from each distinct string within my dbo table. The ultimate goal is to make a drop down box within Visual Studio so that one can choose all lines from the database that contain a specific product code (see definition of product code below). Example strings:

in_0314_95pf_500_w_0315

in_0314_500_95pf_0315_w

The part of these strings I am wishing to identify is the 3 digit numeric code (in this case let us call it product code) that appears once within each string. There are roughly 300 different product codes.

The problem is that these product code values do not appear in the same position within each unique string. Hence, I am having a hard time determining the product code because I can't use substring, charindex, like, etc.

Any ideas? Any help is MUCH appreciated.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
Adam Meyer
  • 11
  • 4
  • 1
    If it's always 3 digits between 2 _, and it's safe to asume that there is nothing other the product code with this pattern then you can use patindex to do it – Zohar Peled Apr 23 '15 at 16:52
  • Do you have a table with all the valid product codes? – clweeks Apr 23 '15 at 16:57
  • In your example, you would want to return those strings if someone searched for "500," but not if someone searched for "314"? – APH Apr 23 '15 at 17:01
  • In my example, "0314" is the MMYY. People within my organization are well aware that the product code is 3 numeric characters. Dates are always four numeric characters, so I don't think this will be a problem... – Adam Meyer Apr 23 '15 at 18:53

3 Answers3

1

This can be done with PATINDEX:

DECLARE @s NVARCHAR(100) = 'in_0314_95pf_500_w_0315'
SELECT SUBSTRING(@s, PATINDEX('%[_][0-9][0-9][0-9][_]%', @s) + 1, 3)

Output:

500

If there are no underscores then:

SELECT SUBSTRING(@s, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', @s) + 1, 3)

This means 3 digits between any symbols that are not digits.

EDIT:

Apply to table like:

SELECT SUBSTRING(ColumnName, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', ColumnName) + 1, 3)
FROM TableName
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
0

One approach is to use a String splitting table function like this one which breaks the string up into its components. You can then filter the components based on your criteria:

SELECT Name
FROM dbo.splitstring('in_0314_95pf_500_w_0315', '_')
WHERE ISNUMERIC(Name) = 1 AND LEN(Name) = 3;

I've amended the function slightly to accept the delimiter as a parameter.

CREATE FUNCTION dbo.splitstring ( @stringToSplit VARCHAR(MAX), @delimiter VARCHAR(50))
RETURNS
 @returnList TABLE ([Name] [nvarchar] (500))
AS
BEGIN

 DECLARE @name NVARCHAR(255)
 DECLARE @pos INT

 WHILE CHARINDEX(@delimiter, @stringToSplit) > 0
 BEGIN
  SELECT @pos  = CHARINDEX(@delimiter, @stringToSplit)  
  SELECT @name = SUBSTRING(@stringToSplit, len(@delimiter), @pos-len(@delimiter))

  INSERT INTO @returnList 
  SELECT @name

  SELECT @stringToSplit = SUBSTRING(@stringToSplit, @pos+LEN(@delimiter), 
         LEN(@stringToSplit)-@pos)
 END
INSERT INTO @returnList
 SELECT @stringToSplit

 RETURN
END

To apply this to your table, use CROSS APPLY (Single Delimiter):

SELECT mt.Name, x.Name AS ProductCode
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, '_') x
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3

Update, Multiple Delimiters

I guess the real underlying problem is that ultimately the product codes need to be normalized out of the composite key (e.g. add a distinct ProductId or ProductCode column to the same table), derived using a query like this, and then stored back in the table via an update. Reverse engineering the product codes out of the string appears to be a trial and error process.

Nonetheless, you can continue to keep passing the split strings through further splitting functions (one per each type of delimiter), before applying your final discriminating filter:

SELECT *
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, 'test') y -- First alias
      CROSS APPLY dbo.splitstring(y.Name, '_') x -- Reference the preceding alias
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3; -- Must reference the last alias (x)

Note that the stringsplit function has again been changed to accommodate multicharacter delimiters.

Community
  • 1
  • 1
StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • This will work fine for one string, but I think with alot of strings it will suffer slow performance. – Zohar Peled Apr 23 '15 at 17:00
  • @ZoharPeled sure, but the OP can run the above as a one-off exercise can then create a new column on his existing product table, or on a new table keyed by the composite string products, which will then serve future queries. The real problem of course is the use of a composite key, whereas the components should have been normalized. – StuartLC Apr 23 '15 at 17:04
  • I agree, if the OP has control over the table's structure. It's probably better to suffer the split string once and have the data stored in a normalized manner. – Zohar Peled Apr 23 '15 at 17:09
  • @StuartLC thank you for your help! Once I entered in my table information and column name as: select mt.DSIN, x.Name as ProductCode from dbo.CM_Summary mt CROSS APPLY dbo.splitstring(mt.DSIN, '_') x WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3 I got the following error: "Msg 8144, Level 16, State 3, Line 3 Procedure or function dbo.splitstring has too many arguments specified." Can you help? Thanks! – Adam Meyer Apr 23 '15 at 18:46
  • Make sure you use the splitstring function I've posted here, not the one I linked (to give credit to the author). I've amended the original one to accept a different delimiter (underscore in your case) – StuartLC Apr 23 '15 at 18:50
  • Uh oh. I found a problem, it appears not all product codes are not within two underscores... For example: ai_1213_75test500_9mslfix_0914 Is there a work-around for this? – Adam Meyer Apr 23 '15 at 19:06
  • I've updated - you can continue to cross apply multiple delimiters to the `chain` of split strings - if the delimiter isn't applicable, the whole string will be returned. You'll probably need to experiment with the order to prevent false positives being matched. Also note that the string split function has been amended. Good luck - afraid I'm off to bed :) – StuartLC Apr 23 '15 at 19:38
0

If you have a table (or can generate in inline view) of the product codes, you can join the list of long strings to the product codes with a like clause.

Create Table longcodes (
    longcode varchar(20)
    )

Create Table products (
    prodCode char(3)
    )

insert products values('100')
insert products values('111')
insert products values('123')

insert longcodes values ('abc_a_100_test')
insert longcodes values ('asdf_111_bob')
insert longcodes values ('in_0314_123_95pf')
insert longcodes values ('f_100_u')
insert longcodes values ('hihi_111_bye')
insert longcodes values ('in_123_0314_95pf')
insert longcodes values ('a_b__c_d_100_efg')

select * 
from products p 
     join longcodes l on l.longcode like '%_' + p.prodCode + '_%'

And they get aligned with the product codes like this:

prodCode    longcode
100         abc_a_100_test
100         f_100_u
100         a_b__c_d_100_efg
111         asdf_111_bob
111         hihi_111_bye
123         in_0314_123_95pf
123         in_123_0314_95pf

EDIT: Seeing the developments in the other answer, you can simplify the like clause to

like p.prodCode

and just deal with the fact that you have a much greater chance of a single composite string producing multiple matches.

clweeks
  • 856
  • 7
  • 31