0

I'm using SQL SERVER version 15.0.18390.0

I have dataframe with a column like so which has comma separated values of animals. How do i alter the dataframe to output a comma separated list of unique animals from the input?

input:

column1
dog,cat,dog,dog
dog,dog,bird,cat
panda,bird,cat

expected output:

column1
dog,cat
dog,bird,cat
panda,bird,cat
Eisen
  • 1,697
  • 9
  • 27
  • 1
    Add your SQL Server version to your question. – Dan Guzman Nov 15 '21 at 14:38
  • 2
    Normalize your schema. See ["Is storing a delimited list in a database column really that bad?"](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) (Spoiler: Yes, it is.). – sticky bit Nov 15 '21 at 14:38
  • Updated the question with version – Eisen Nov 15 '21 at 14:39
  • 4
    Comma separated columns aren't a thing you should EVER let happen. – Joel Coehoorn Nov 15 '21 at 14:49
  • 1
    If you're on an older version of SQL Server (I'm not sure where `15.0.18390.0` came from, but that's not a SQL Server version), see [this post](https://www.mssqltips.com/sqlservertip/4140/removing-duplicates-from-strings-in-sql-server/). But agree with the others - stop storing your data this way and your requirement magically comes infinitely easier to solve. – Aaron Bertrand Nov 15 '21 at 14:51
  • 15.0.18390.0 will be from SSMS @AaronBertrand . It's the version number of the component "SQL Server Management Studio". As many of us know, Microsoft do like to sometimes confuse people with their versioning. 15.0.18390.0 appears to be SSMS 18 (it is not tied to a specific version of SSMS 18, as both 18.10 and 18.9.2 show the same value for the component). – Thom A Nov 15 '21 at 14:56
  • 1
    @Larnu I mostly use ADS these days but I complained a long time ago that SSMS 18.x should be 18.0.xxxxx and was laughed out of the room. `/shrug` There's actually a technical reason (ties to Visual Studio versioning) but I still think there could be a private build number and a user-exposed version label. And maybe also some kind of warning like `no, this is NOT the version of the SQL Server you're connecting to!` – Aaron Bertrand Nov 15 '21 at 15:10
  • One day people will stop telling us they're "from the future" and using SQL Server Version 18, @AaronBertrand . ;) Maybe if they switch to ADS (which I also use as much as SSMS) they'll start telling us they're using SQL Server version 1 or 2... (please no...) – Thom A Nov 15 '21 at 15:14

1 Answers1

1

The real solution:

Fix your design.

The real question here is why are you storing delimited data in your database in the first place? That is your real problem, and so the real solution is to fix your design. Normalise your data, don't store delimited data. It breaks multiple basic rules in for (relational) data. Move to multiple tables with many to one/many relationships.


Saying that, however, you can do this with STRING_SPLIT and STRING_AGG. Note that as STRING_SPLIT doesn't return an ordinal value, the order the values are added back to the string are unlikely to follow the original ordinal positions.

I also assume you're using a fully supported version of SQL Server, without evidence to the contrary. If not, you'll need to use a user defined splitter function and the FOR XML PATH methods for the 2 functions respectively.


WITH DistinctValues AS(
    SELECT DISTINCT
           V.DenormalisedData,
           SS.[Value]
    FROM (VALUES('dog,cat,dog,dog'),
                ('dog,dog,bird,cat'),
                ('panda,bird,cat'))V(DenormalisedData)
         CROSS APPLY STRING_SPLIT(V.DenormalisedData,',') SS)
SELECT STRING_AGG(DV.[Value],',') AS RedenormalisedData
FROM DistinctValues DV
GROUP BY DenormalisedData;
Thom A
  • 88,727
  • 11
  • 45
  • 75