0
With T as
( 
select 'Cytomegalovirus Nucleoside Analog DNA Polymerase Inhibitor [EPC],DNA Polymerase Inhibitors [MoA],Nucleoside Analog [Chemical/Ingredient],Nucleoside Analog Antiviral [EPC]' CLASS 
FROM DUAL )

Need to pull strings with [EPC].

Desired Output:

Cytomegalovirus Nucleoside Analog DNA Polymerase Inhibitor [EPC], Nucleoside Analog Antiviral [EPC]
DhruvJoshi
  • 17,041
  • 6
  • 41
  • 60
  • 3
    Do it client-side, or write a CLR function (lots of examples online on how to do regex matching with those). Doing this in pure T-SQL is possible, but an exercise in suffering. (Unless you have SQL Server 2017, then `STRING_SPLIT` and `STRING_AGG` can do it quite easily.) – Jeroen Mostert Oct 03 '17 at 13:27
  • can you please provide T-SQL code using my example? – Nehal Patel Oct 03 '17 at 13:34

1 Answers1

1

You do not need true regexes, since your split string is a literal. On SQL Server 2017, this should work (not tested):

SELECT STRING_AGG([value], ',')
FROM STRING_SPLIT('Cytomegalovirus Nucleoside Analog DNA Polymerase Inhibitor [EPC],DNA Polymerase Inhibitors [MoA],Nucleoside Analog [Chemical/Ingredient],Nucleoside Analog Antiviral [EPC]', ',')
WHERE [value] LIKE '%\[EPC\]%' ESCAPE '\'

On SQL Server 2016, we lack STRING_AGG, but we do have STRING_SPLIT, so this will work (tested):

SELECT STUFF((
    SELECT ',' + [value]
    FROM STRING_SPLIT('Cytomegalovirus Nucleoside Analog DNA Polymerase Inhibitor [EPC],DNA Polymerase Inhibitors [MoA],Nucleoside Analog [Chemical/Ingredient],Nucleoside Analog Antiviral [EPC]', ',')
    WHERE [value] LIKE '%\[EPC\]%' ESCAPE '\'
    FOR XML PATH('')
), 1, 1, '')

On earlier versions, lack of decently performing native string splitting makes this a giant pain in the behind. Many questions deal with that, like this one. Using any of those solutions in combination with the FOR XML PATH trick for concatenation will work.

Alternatively, a CLR function can do this cleaner and faster (and with true regexes), but implementing those is more involved. This question is a good start for that.

Last but certainly not least: if you have to perform these sorts of operations in SQL, this is usually a sign that your database design is lacking. In particular, the comma-separated values should have been separate rows with an EPC BIT NOT NULL column (or more generally, a CategoryID INT REFERENCES Categories(ID) column), and you should consider ways to have your client code store data in a way that is amenable to efficient manipulation by your database. Normalization is the keyword here.

Jeroen Mostert
  • 27,176
  • 2
  • 52
  • 85