1

I'm stuck on an aggregation problem that I can't get to the bottom of.

I have some data which is best summarized as follows

id |phraseId|seqNum|word
=========================
1  |1       |1     |hello
2  |1       |2     |world
3  |2       |1     |black
4  |2       |2     |and
5  |2       |3     |white

I'd like a query that gives back the following data:

phraseId|completePhrase
========================
1       |hello world
2       |black and white

Anyone?

EDIT

I notice all the provided solutions use FOR XML PATH. What is this magic?

spender
  • 117,338
  • 33
  • 229
  • 351

4 Answers4

2

One solution is to create an UDF using an FOR XML PATH expression.

  • the UDF handles the concatenation of one PhraseID
  • it can be used in a normal select

SQL Statement

SELECT  PhraseID, dbo.UDF_ConcatWord(PhraseID)
FROM    Phrases
GROUP BY PhraseID

Creating the UDF

CREATE FUNCTION dbo.UDF_ConcatWord(@phraseID INT) RETURNS VARCHAR(8000) AS
BEGIN  
  DECLARE @r VARCHAR(8000)
  SELECT @r = (
    SELECT  word + ', '
    FROM    Phrases
    WHERE   phraseID = @phraseID
    FOR XML PATH('')
  )
  IF LEN(@r) > 0 SET @r = SUBSTRING(@r, 1, LEN(@r)-1)
  RETURN @r
END
GO

edit

After revising some of the links myself, an even shorter solution is

SQL Statement

SELECT  DISTINCT p1.PhraseID
        , STUFF(( SELECT  ' ' + p2.word 
                  FROM    Phrases AS p2 
                  WHERE   p2.PhraseID = p1.PhraseID 
                  FOR XML PATH('')), 1, 1, '') AS completePhrase
FROM      Phrases AS p1
ORDER BY  p1.PhraseID
Lieven Keersmaekers
  • 57,207
  • 13
  • 112
  • 146
  • I saw the cross join answer & your UDF, so I modified my parent-child answer: http://stackoverflow.com/questions/2342811/multiple-results-for-one-field-in-a-joined-sql-query/2343166#2343166 Then I noticed your edit, which is almost the same as mine. I was checking the execution plans to see if there was a difference, and mine was slower because I sort when combining the strings & you don't. If the data is inserted out of order, your words will be in the wrong order. Adding a sort to your query makes it TotalSubtreeCost slower by 0.00000105, which must be the diff between GROUP BY vs DISTINCT. – KM. Mar 17 '10 at 12:57
  • OK. Thanks for all the answers. I've gone with this form, but they were all excellent. Thanks – spender Mar 17 '10 at 13:21
2

try this:

DECLARE @TableA  table (RowID int, phraseId varchar(5),seqNum int, word varchar(5))

INSERT INTO @TableA VALUES (1,1,1,'hello')
INSERT INTO @TableA VALUES (2,1,2,'world')
INSERT INTO @TableA VALUES (3,2,1,'black')
INSERT INTO @TableA VALUES (4,2,2,'and')
INSERT INTO @TableA VALUES (5,2,3,'white')

SELECT
    c1.phraseId
        ,STUFF(
                 (SELECT
                      ' ' + word
                      FROM @TableA  c2
                      WHERE c2.phraseId=c1.phraseId
                      ORDER BY c1.phraseId, seqNum
                      FOR XML PATH('') 
                 )
                 ,1,1, ''
              ) AS CombinedValue
    FROM @TableA c1
    GROUP BY c1.phraseId
    ORDER BY c1.phraseId

OUTPUT:

phraseId CombinedValue
-------- --------------------------
1        hello world
2        black and white

(2 row(s) affected)
KM.
  • 101,727
  • 34
  • 178
  • 212
1

I have cheated a bit by assuming that you have a table which holds the header record for each phrase. If this is missing, you could construct it by selecting a distinct list of phraseIDs from the table containing the words:

declare @words table
(id int
,phraseId int
,seqNum int
,word varchar(10)
)

insert @words
select 1,1,1,'hello'
union select 2,1,2,'world'
union select 3,2,1,'black'
union select 4,2,2,'and'
union select 5,2,4,'white'

declare @phrase table
(phraseId int)

insert @phrase
select 1
union select 2

select phraseID
       ,phraseText AS completePhrase
FROM @phrase AS p
CROSS APPLY (select word + ' ' as [text()]
             from @words AS w
             where w.phraseID = p.phraseID
             for xml path('')
            ) as phrases (phraseText)
Ed Harper
  • 21,127
  • 4
  • 54
  • 80
  • Good, but your assumption of a header record is unnecessary. For instance: select phraseID,phraseText AS completePhrase FROM (select distinct phraseId from @words) AS p CROSS APPLY (select word + ' ' as [text()] from @words AS w where w.phraseID = p.phraseID for xml path('') ) as phrases (phraseText) – spender Mar 17 '10 at 12:23
  • @spender - Isn't that exactly what I said at the start of my answer? – Ed Harper Mar 17 '10 at 13:52
  • Sorry... missed that. Scratch my last comment! – spender Mar 17 '10 at 14:46
0

In the end I used Lieven's second answer, but found that for certain string combinations, the FOR XML PATH('') trick causes problems to occur:

declare @phrases table
(
    id int
    ,phraseId int
    ,seqNum int
    ,word varchar(10)
)

insert 
    @phrases 
values
    (1,1,1,'hello'),
    (2,1,2,'world'),
    (3,2,1,'black'),
    (4,2,2,'and'),
    (5,2,3,'white')

SELECT  
    DISTINCT p1.PhraseID, 
    STUFF(
        ( 
            SELECT  
                ' ' + p2.word 
            FROM    
                @phrases AS p2 
            WHERE   
                p2.PhraseID = p1.PhraseID 
            FOR XML PATH('')
        ), 1, 1, '') AS completePhrase
FROM      
    @phrases AS p1
ORDER BY  
    p1.PhraseID

works fine, but if the example uses characters that would need escaping were they used in an XML, problems occur. For instance, running the following data through it:

insert 
    @words 
values
    (1,1,1,'hello>'), --notice the less than symbol
    (2,1,2,'world'),
    (3,2,1,'black')

Gives

hello> world

and also if the source table is declared out of order, an order by is required

A small mod to the original query fixes all:

SELECT  
    DISTINCT p1.PhraseID, 
    STUFF(
        ( 
            SELECT  
                ' ' + p2.word 
            FROM    
                @words AS p2 
            WHERE   
                p2.PhraseID = p1.PhraseID 
            ORDER BY
                p2.seqNum  --required
            FOR XML PATH(''),TYPE
        ).value('.','nvarchar(4000)'), 
        1, 
        1, 
        ''
    ) AS completePhrase
FROM      
    @words AS p1
ORDER BY  
    p1.PhraseID

(see FOR XML PATH(''): Escaping "special" characters)

Community
  • 1
  • 1
spender
  • 117,338
  • 33
  • 229
  • 351