3

I have some (untyped) XML being stored in SQL Server 2005 that I need to transform into a normalized structure. The structure of the document currently looks like so:

<wrapper>
 <parent />
 <node />
 <node />
 <node />

 <parent />
 <node />
 <node />
 <node />
<wrapper> 

I want to transform it to look like this:

<wrapper>
 <parent>
  <node />
  <node />
  <node />
 </parent>
 <parent>
  <node />
  <node />
  <node />
 </parent>
<wrapper> 

I can select the XML out into a relational structure if I need to, put the problem is there are no attributes linking the parent and the child nodes together, so order becomes an issue when using set-based operations. How can I use the .nodes()/.value()/other SQL Server XML APIs to transform this data? The transformation needs to run as part of a batch SQL script so extracting it into another tool/language is not a reasonable option for me.

jonathan.cone
  • 6,592
  • 2
  • 30
  • 30
  • 1
    I highly doubt you can do something like this with reasonable effort in SQL Server. Its XQuery engine is great for querying XML and extracting data from it - less so for doing heavier lifting on the manipulation side. I think you'd be better off grabbing that XML from SQL Server into e.g. C#/.NET and do the fixing there and store it back. – marc_s Jul 21 '11 at 19:19

1 Answers1

1

Actually - following code works (grouping here may be isn't very optimal, but anyway):

declare @xml xml = '
    <wrapper>
     <parent id="1" />
     <node id="1" />
     <node id="2" />
     <node id="3" />

     <parent id="2" />
     <node id="4" />
     <node id="5" />
     <node id="6" />
    </wrapper> 
'

;with px as
(
    select row_number() over (order by (select 1)) as RowNumber
        ,t.v.value('@id', 'int') as Id
        ,t.v.value('local-name(.)', 'nvarchar(max)') as TagName
    from @xml.nodes('//wrapper/*') as t(v)
)
select p.Id as [@id],
    (
        select n.Id as id
        from px n
        where n.TagName = 'node'
            and n.RowNumber > p.RowNumber
            and not exists
            (
                select null
                from px np
                where np.TagName = 'parent'
                    and np.RowNumber > p.RowNumber
                    and np.RowNumber < n.RowNumber
            )
        order by n.RowNumber
        for xml raw('node'), type
    )
from px p
where p.TagName = 'parent'
order by p.RowNumber
for xml path('parent'), root('wrapper')

But I don't recommend to use it. See here: http://msdn.microsoft.com/en-us/library/ms172038%28v=sql.90%29.aspx:

In SQLXML 4.0, document order is not always determined
So I'm not sure that we can rely on order of tags inside wrapper (and code above is more just for fun than for practical use).
oryol
  • 5,178
  • 2
  • 23
  • 18
  • Thanks for the input, I arrived at the same conclusion -- this is non-deterministic. I'll figure out something else. – jonathan.cone Jul 21 '11 at 23:51
  • @jonathan - You can use `position()` with a numbers table to enumerate the rows in a guaranteed deterministic way. Have a look at this answer. http://stackoverflow.com/questions/6472533/xquery-and-node-ids/6473228#6473228 – Mikael Eriksson Jul 22 '11 at 17:04
  • 1
    Great solution! I tried to use position() in value function but it didn't help (position can be used only inside predicates). – oryol Jul 24 '11 at 12:31