0

I have CSV data stored in a column in SQL Server 2008.

[PKID] [DataCSV]
1      1,2,4
2      3,5

The following will convert that data into rows...

; WITH [XmlTable] AS (
   SELECT [PKID], 
          CAST('<d><i>' + REPLACE([MyData],',','</i><i>') + '</i></d>' AS XML) AS [MyDataXml]
   FROM [MyTable]
)
SELECT [PKID],
  CX.value('.','int') AS [DataItem]
FROM [XmlTable]
  CROSS APPLY [MyDataXml].nodes('/d/i') AS CX(CX)

With the correct/expected/wanted result of...

[PKID] [DataItem]
1      1
1      2
1      4
2      3
2      5

My question is whether there is a better, more efficient, more accepted way of achieving the same thing without the use of CTE/XML?

freefaller
  • 19,368
  • 7
  • 57
  • 87
  • 1
    CSV refers to a way of formatting *files*. What you are asking has to do with storing/passing multiple values. You'll find dozens of duplicate questions on how to split them, and advice on *not* using such values in the first place. Such structures typically mean a design problem – Panagiotis Kanavos Sep 18 '15 at 08:29
  • Yes, Another way is to use user defined function `Split(',',ColName)` in your query. – Krishnraj Rana Sep 18 '15 at 08:29
  • 1
    Aaron Bertrand has a series of articles describing and comparing the various techniques of splitting strings in SQL Server [here](http://sqlperformance.com/2012/07/t-sql-queries/split-strings). Better to avoid such strings though - a table that has one row per value is far easier to query and process – Panagiotis Kanavos Sep 18 '15 at 08:32
  • @Panagiotis - *CSV refers to a way of formatting files* - I was unaware of that. There is nothing in the words "Comma Separated Values" that suggests it's purely files, more that there are values separated by commas. Yes, I agree it's not a good way to store data, however, that was not what I was aking... we all have to deal with things that are sub-optimal – freefaller Sep 18 '15 at 08:32
  • @freefaller If you search for SQL and CSV you *won't* find the answers that have to do with string splitting. As for suboptimal - part of an ETL process is to *fix* the suboptimal data and transform it to a schema suitable for querying. Otherwise you'll have serious performance and complexity problems when you try to use the values you extracted from the denormalized field. Filtering, aggregations will be a lot harder and slower – Panagiotis Kanavos Sep 18 '15 at 08:35
  • @Krishnraj - the problem with user defined functions are, well, they're user defined... so only giving me the function definition doesn't exactly help me much – freefaller Sep 18 '15 at 08:36
  • Read this: http://sqlperformance.com/2012/07/t-sql-queries/split-strings – Evaldas Buinauskas Sep 18 '15 at 08:37
  • Thanks @Evaldas - Panagiotis has already linked to that – freefaller Sep 18 '15 at 08:37
  • I think it's different link. – Evaldas Buinauskas Sep 18 '15 at 08:38
  • @Panagiotis - yes, I'm fully aware that it would be great to "fix the suboptimal data", but I am not in that position. The world would be a much better place if we could do exactly what we wanted, but we can't... which is why I'm asking the question in the first place. Please stop telling me what I should do in an ideal world, and instead focus on what I'm having to do in the real world – freefaller Sep 18 '15 at 08:40
  • 1
    @Evaldas - sorry, I meant the "hidden" link (under the "here" in his comment starting "Aaron Bertrand...") – freefaller Sep 18 '15 at 08:41
  • @freefaller I think you misunderstood what I said. Knowing what is the *actual* problem you are trying to solve, will help a lot too. You can extract the data to a temporary table that allows querying and indexing. An alternative is to create an indexed view on top of the denormalized table that splits the data and presents it as separate rows. Adding the index will store the split rows so queries over them will be fast – Panagiotis Kanavos Sep 18 '15 at 08:52
  • @Panagiotis - ah, my apologies - I did misunderstand. But you still need an effective way to *extract the data to a temporary table*, so I'm looking at the UDF's linked to in the SO question you've provided as possible duplicate – freefaller Sep 18 '15 at 08:58

0 Answers0