You do not need true regexes, since your split string is a literal. On SQL Server 2017, this should work (not tested):
SELECT STRING_AGG([value], ',')
FROM STRING_SPLIT('Cytomegalovirus Nucleoside Analog DNA Polymerase Inhibitor [EPC],DNA Polymerase Inhibitors [MoA],Nucleoside Analog [Chemical/Ingredient],Nucleoside Analog Antiviral [EPC]', ',')
WHERE [value] LIKE '%\[EPC\]%' ESCAPE '\'
On SQL Server 2016, we lack STRING_AGG
, but we do have STRING_SPLIT
, so this will work (tested):
SELECT STUFF((
SELECT ',' + [value]
FROM STRING_SPLIT('Cytomegalovirus Nucleoside Analog DNA Polymerase Inhibitor [EPC],DNA Polymerase Inhibitors [MoA],Nucleoside Analog [Chemical/Ingredient],Nucleoside Analog Antiviral [EPC]', ',')
WHERE [value] LIKE '%\[EPC\]%' ESCAPE '\'
FOR XML PATH('')
), 1, 1, '')
On earlier versions, lack of decently performing native string splitting makes this a giant pain in the behind. Many questions deal with that, like this one. Using any of those solutions in combination with the FOR XML PATH
trick for concatenation will work.
Alternatively, a CLR function can do this cleaner and faster (and with true regexes), but implementing those is more involved. This question is a good start for that.
Last but certainly not least: if you have to perform these sorts of operations in SQL, this is usually a sign that your database design is lacking. In particular, the comma-separated values should have been separate rows with an EPC BIT NOT NULL
column (or more generally, a CategoryID INT REFERENCES Categories(ID)
column), and you should consider ways to have your client code store data in a way that is amenable to efficient manipulation by your database. Normalization is the keyword here.