0

I was able to find information on how to separate a comma delineated strings into columns here: How Do I Split a Delimited String in SQL Server Without Creating a Function? - but it doesn't quite solve my problem.

The data I am working with is a list of interests/activities for a given record. For instance, a record would have 'yoga, hiking' etc... At present we have 63 distinct activities total but we also need to accommodate for activities that do not yet exist.

What I've been tasked with is creating an automated process which:

  • Creates a new column in the table for each activity listed (and creates new columns as presently non-existent activities are added)
  • Have the columns reflect the name of the activity
  • If the record contains an activity, the column for that activity has a Y in it.

Ultimately, the outcome would look something like this...

id  | activities       | art  | yoga | music | hiking | ...
-----------------------------------------------------------
1   | yoga, hiking     | NULL | y    | NULL  | y      | ...
2   | art, music, yoga | y    | y    | y     | NULL   | ...

Thank you in advance for any help that can be provided, let me know if any more information is required.

Community
  • 1
  • 1
  • Show what you have tried thus far, we're here to help resolve after you've given this some effort. – Dave C Jun 16 '16 at 14:13
  • I have tried researching the answer on my own and uploaded the closest example of what I was looking for I could find. I have not written any syntax for this because I'm honestly not sure where to begin. – Heineko Kaze Jun 17 '16 at 15:05

1 Answers1

0

There are two concepts to learn here.

First, you have to split the string into rows. xquery is the fastest method on large recordsets, albeit more difficult to write.

Then you need to use a dynamic pivot to put the activities into new columns.

I've provided two outputs below, just pick which suites your needs best.

Xquery Example

Dynamic Pivot Example

Code sample:

if OBJECT_ID('tempdb..#tmp') is not null drop table #tmp
if OBJECT_ID('tempdb..#tmpsplit') is not null drop table #tmpsplit

declare @cols nvarchar(MAX), @sql nvarchar(MAX)
create table #tmp (id int identity(1,1), activities varchar(255))
insert into #tmp (activities)
values ('yoga, hiking'),
       ('art, music, yoga')

/* split activities into individual rows and store in #tmpsplit*/
SELECT DISTINCT A.ID,  
       ltrim(rtrim(Split.a.value('.', 'VARCHAR(max)'))) AS activity
into #tmpsplit     
FROM  (SELECT id,  
              CAST ('<M>' + REPLACE(CAST(activities AS VARCHAR), ',', '</M><M>') + '</M>' AS XML) AS String  
       FROM #tmp) AS A 
CROSS APPLY String.nodes ('/M') AS Split(a)
WHERE LEN(Split.a.value('.', 'VARCHAR(max)'))>1

/* create list of activities to pivot */
SET @cols = STUFF((SELECT distinct ',' + QUOTENAME(c.activity) 
            FROM #tmpsplit c
            FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') ,1,1,'')

/* create pivot: id and columns only*/
set @sql = 'select id, ' + @cols + ' 
              from (select s.*, ''y'' as val 
                    from #tmpsplit s) datatable
              pivot (max(val) for activity in (' + @cols + ')
            ) p '
print(@sql)
exec(@sql)

/* create pivot:  id, full activitites string, and columns*/
set @sql = 'select id, activities, ' + @cols + ' 
              from (select s.*, t.activities, ''y'' as val 
                    from #tmpsplit s 
                    join #tmp t on s.id=t.id) datatable
              pivot (max(val) for activity in (' + @cols + ')
            ) p '
print(@sql)
exec(@sql)
Dave C
  • 7,272
  • 1
  • 19
  • 30