-1

I have a table that looks like

srvc_dt     fclity_id  clmt_key rfrsh_dt    srvc_cd
2013-01-16  000167404   2       2013-01-31  01204
2013-01-16  000167404   2       2013-01-31  11112
2011-01-11  000167404   2488    2013-02-28  11113
2013-01-08  000167404   6070    2013-01-31  11113
2013-01-08  000167404   28371   2013-01-31  01202
2013-01-08  000167404   28371   2013-01-31  11107
2013-01-08  000167404   28371   2013-01-31  11112
2013-01-08  000167404   28371   2013-01-31  11117
2013-01-08  000167404   28371   2013-01-31  12101
2012-07-17  000167404   40571   2013-01-31  11113
2012-11-19  000167404   40571   2013-01-31  11113
2013-01-14  000167404   44189   2013-01-31  27211
2010-07-06  000167404   46396   2013-02-28  01202
2010-07-06  000167404   46396   2013-02-28  02142
2010-07-06  000167404   46396   2013-02-28  11107
2010-07-06  000167404   46396   2013-02-28  11112
2013-01-03  000167404   113264  2013-01-31  01204
2013-01-03  000167404   113264  2013-01-31  23323
2013-01-07  000167404   113390  2013-01-31  01202
2013-01-07  000167404   113390  2013-01-31  02142

and I would like to transform it into something like

srvc_dt     fclity_id  clmt_key rfrsh_dt      srvc_cd
2010-07-06  000167404   46396   2013-02-28    01202,02142,11107,11112
2011-01-11  000167404   2488    2013-02-28    11113
2012-07-17  000167404   40571   2013-01-31    11113
2012-11-19  000167404   40571   2013-01-31    11113
2013-01-03  000167404   113264  2013-01-31    01204,23323
2013-01-07  000167404   113390  2013-01-31    01202,02142
2013-01-08  000167404   6070    2013-01-31    11113
2013-01-08  000167404   28371   2013-01-31    01202,11107,11112,11117,12101
2013-01-14  000167404   44189   2013-01-31    27211
2013-01-16  000167404   2       2013-01-31    01204,11112

what would be the most performant (for > 100 000 000 rows) way to accomplish this in SQL SERVER 2012?

PIVOT? FOR XML PATH? dynamic case statements? something else?

EDIT: while Simulating group_concat MySQL function in Microsoft SQL Server 2005? does display 1 way to accomplish this, I'm looking for performance comparisons between different methods

Community
  • 1
  • 1
Traceur
  • 33
  • 11
  • This has been answered around here and the rest of the internet hundreds if not thousands of times. PIVOT is NOT going to help because it converts the grouped data into columns which is not what you want. The FOR XML technique is what you want. – Sean Lange Jul 08 '16 at 15:51
  • Possible duplicate of [Simulating group\_concat MySQL function in SQL Server?](http://stackoverflow.com/questions/451415/simulating-group-concat-mysql-function-in-sql-server) – Sean Lange Jul 08 '16 at 15:52
  • @SeanLange I read this other question before posting and while FOR XML PATH WOULD accomplish this, I believe that it is a very costly technique. PIVOT could also realistically get this done, once you have your grouped data into columns, you could merge them – Traceur Jul 08 '16 at 16:16
  • What??? So you are going to PIVOT and then append them? I don't understand what you are saying. But yes, doing FOR XML can be costly. You are denormalizing your data so it isn't going to be super fast no matter how you slice it. – Sean Lange Jul 08 '16 at 16:19
  • To pivot your data would require a dynamic pivot and then more dynamic sql to concatenate each value back into a single column. My question is why do you need denormalized data for 100,000,000 rows? – Sean Lange Jul 08 '16 at 16:31
  • @SeanLange Pivot would give you a table with 1 column per srvc_cd right? you could select from that table but grouping the srvc_cd columns – Traceur Jul 08 '16 at 16:32
  • Right but you still would need to use FOR XML to shove them back into a single column. All you would be doing with PIVOT is turning the rows into columns. You want this as a single column. – Sean Lange Jul 08 '16 at 16:33
  • @SeanLange I need it like this for a dimensional model. and while I agree that the PIVOT method would require more code, I think it might end up executing faster – Traceur Jul 08 '16 at 16:35
  • @SeanLange you wouldn't need FOR XML, say your Pivot table gets the following columns, srvc_dt, fclity_id, clmt_key ,rfrsh_dt, 00001, 00002, 00003, etc. You could run the following statement SELECT srvc_dt, fclity_id, clmt_key ,rfrsh_dt, 00001 + ',' + 00002 + ',' + 00003, etc – Traceur Jul 08 '16 at 16:37
  • It isn't about more or less code. I have no fear of writing lots of code to make something fast. What you are describing would require dynamic sql to append your unknown number of columns into a delimited list. – Sean Lange Jul 08 '16 at 16:38
  • 1
    Instead of guessing about what may or may not be faster you should write it both ways and actually test it with your data. I would be interested to hear your findings. – Sean Lange Jul 08 '16 at 16:38
  • @SeanLange I don't know why you are so condescending, I simply thought someone with more experience/knowledge on the subject could point me in the right direction – Traceur Jul 08 '16 at 16:54
  • 1
    @Traceur If you are looking to see which one will perform faster, why not write both queries and test it? You've got all the info you need on your end including indexes, etc. Either way trying to do this for that many rows is most likely going to be costly. – Taryn Jul 08 '16 at 17:16
  • @bluefeet I still don't get why I can't ask the question though, I'm not the first to ask for the best way of doing something. If anyone knew off the bat, I wouldn't have to spend the time testing it – Traceur Jul 08 '16 at 17:51
  • 2
    @Traceur Because we'd be guessing as to which would be better on your server. We probably don't have that much data and **we don't know your server and databases**, IMO we don't have all the info needed to give a proper answer. You can best answer this yourself by testing both and then asking how to improve performance of one or the other. – Taryn Jul 08 '16 at 17:53
  • @bluefeet there might not be a clear-cut answer, but why can't I get an answer similar to what this question got [link](http://stackoverflow.com/questions/42648/best-way-to-get-identity-of-inserted-row?rq=1) – Traceur Jul 08 '16 at 18:17
  • @Traceur Because that type of question was asked in 2008 and "the best way to do something" questions are typically not well-received by the community when asked now. This question here is very open-ended and is difficult to answer. As I've said, your best bet is to do some testing on your end and then ask a specific question on how you could improve either a pivot based version or an XML version of the query. – Taryn Jul 08 '16 at 18:20
  • @bluefeet ok thanks, I didn't know these questions were not well-received anymore as I'm fairly new to this platform. Tim Lehner did provide an answer to the question though so it's not entirely hopeless – Traceur Jul 08 '16 at 18:30

1 Answers1

1

Grouped Concatenation in SQL Server by Aaron Bertrand offers a good look at what you want.

I normally don't like link-only answers, but since you're looking for the performance comparison of various techniques, there's no reason to duplicate a huge article here.

I also agree with the comments which say you'll ultimately have to test these solutions on your system with your data anyway. If performance really matters, you'll have to measure for yourself.

Community
  • 1
  • 1
Tim Lehner
  • 14,813
  • 4
  • 59
  • 76
  • thanks for your answer, I agree that ultimately, I'll have to test it myself, but the intent of my question was to get a better idea of what options I have and their advantages/drawbacks, which are explained in the link you provided – Traceur Jul 08 '16 at 18:19