1

I need help in fine-tuning this code that I write. I am new to SQL Server and I believe there are better ways to do this or perhaps some of the following codes can be simplified or fine-tuned for performances or saving memory resources.

Basically, I have this XML data :

<table_result id="001" subj_cd="cdaaa" grade="b" name="Phua Chu Kang"/>

enter image description here

and I want to create a table which looks like this from that XML data

enter image description here

Please note on the following points :

  • SplitThis is not a built in function (check the code below).

The data can have space, but delimited by ". Do note as well that the XML data can have varying number of fields-data pairs for that particular given table - referred as #dummy in the following codes. i.e. example XML data above have 4 fields (id, subj_cd, grade, name) and the next XML data could have 5 fields (i.e. id, name, occupation, phone_no, address). In the following code, #table_result is created to match the example XML data for easier demonstration. In other words, the table structures are known..so I can ignore the field names from the XML data and focus on extracting the data itself.

The code ran well on SQL Server 2012 (you can copy and paste run code directly) and I am able to get as above. I just need to fine tune this, if possible. I have include line like this : - - test blabla. You can uncomment that and try. I could use enhancements such as in term of avoiding the number of temp tables used or any ways to replace the use of row_number() in the code.

/* remove all temp tables */
declare @sql varchar(5000)
SELECT @sql = isnull(@sql+';', '') + 'drop table ' + SUBSTRING(t.name, 1, CHARINDEX('___', t.name)-1)
FROM tempdb..sysobjects AS t
WHERE t.name LIKE '#%[_][_][_]%'
AND t.id =OBJECT_ID('tempdb..' + SUBSTRING(t.name, 1, CHARINDEX('___', t.name)-1));
exec (@sql)
/* end */

/* function */
drop function splitthis
go
create function splitthis(@separator char(1), @list varchar(max))
     returns @returntable table(item nvarchar(max))
as
begin
    declare @index int
    declare @newtext varchar(max) 
    if @list = null
        return
    set @index = charindex(@separator, @list)
    while not(@index = 0)
    begin
        set @newtext = rtrim(ltrim(left(@list, @index - 1)))
        set @list = right(@list, len(@list) - @index)
        insert into @returntable(item) values(@newtext)
        set @index = charindex(@separator, @list)
    end
    insert into @returntable(item) values(rtrim(ltrim(@list)))
    update @returntable set item='' where item is null 
    return
end
go
/* end of function */

/* create dummy tables */
create table #table_result 
(id nvarchar(max), subj_cd nvarchar(max), grade nvarchar(max), name nvarchar(max))

create table #dummy (name nvarchar(max), data nvarchar(max))
insert into #dummy 
values ('a', '<table_result id="001" subj_cd="cdaaa" grade="b" name="phua chu kang"/>');
--test : select * from #dummy

/* remove the fist non-data opening tag */
declare @record nvarchar(max)
select @record = data from #dummy where name = 'a'
select *, null as temp into #tempb from splitthis(' ',@record)
select *, row_number() over (order by temp) count into #tempc from #tempb
select item into #tempd from #tempc where #tempc.count>1
-- test : select * from #tempd

/* get the actual field & data into a single column table */
declare @temp varchar(max)
set @temp=''select @temp=@temp+' ' + item from #tempd 
select *, null as temp into #tempe from splitthis('"',@temp)
select *, row_number() over (order by temp) count into #tempf from #tempe
select item, count into #tempg from #tempf
--test : select * from #tempg

/* prepare the data table */
select 
    case when #tempg.count % 2 = 0 
        then item
        else null
    end as data
into #temph
from #tempg 
select data, null as temp into #tempi from #temph
select data, row_number() over (order by temp) count into #data from #tempi
    where data is not null
--test : select * from #data

/* prepare the field table. */
select name, null as temp into #tempj 
from tempdb.sys.columns where object_id=object_id('tempdb..#table_result');
select *, row_number() over (order by temp) count into #field from #tempj
--test : select * from #field

/* get the final table */
select a.name as field, b.data from #field a
left join #data b on a.count=b.count
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Raf
  • 49
  • 1
  • 3
  • 11
  • The splitter you posted is absolute worst performing of any of the splitters available, sadly it is also the most common. See this article for several alternatives that are far better for performance. http://sqlperformance.com/2012/07/t-sql-queries/split-strings – Sean Lange Feb 03 '16 at 19:58
  • 1
    I think this might help you [Convert Xml to Table SQL Server](http://stackoverflow.com/questions/3989395/convert-xml-to-table-sql-server) – jthalliens Feb 03 '16 at 20:19
  • I think, this is an [XY-Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). You do not need help with your splitting code, but another approach actually. If I do not get your question wrong, the question is: How can I get the attributes and their values as name-value-pairs of an XML-Element generically? – Shnugo Feb 03 '16 at 21:33
  • yeah, another approach is great as well. The answer posted below is good one – Raf Feb 04 '16 at 03:19

2 Answers2

1

This is - using XML methods - much easier!

Try this:

DECLARE @xml XML='<table_result id="001" subj_cd="cdaaa" grade="b" name="Phua Chu Kang"/>';

SELECT One.Attr.value('fn:local-name(.)','varchar(max)') AS field
      ,One.Attr.value('.','varchar(max)') AS data
FROM @xml.nodes('table_result/@*') AS One(Attr)

The result

field     data
id        001
subj_cd   cdaaa
grade     b
name      Phua Chu Kang

Now I try to imitate your table structure (I'd recommend to store the data as XML from the beginning! In this case you could omit the first CROSS APPLY with the CAST ... AS XML):

DECLARE @tbl TABLE(name VARCHAR(10),data VARCHAR(MAX));
INSERT INTO @tbl VALUES
 ('a','<table_result id="001" subj_cd="cdaaa" grade="b" name="Phua Chu Kang"/>') 
,('b','<Another test="test data" test2="test2 data"/>') 
,('c','<OneMore x="x data" y="y data" z="z data"/>'); 

SELECT tbl.name
      ,One.Attr.value('fn:local-name(..)','varchar(max)') AS element
      ,One.Attr.value('fn:local-name(.)','varchar(max)') AS field
      ,One.Attr.value('.','varchar(max)') AS data
FROM @tbl AS tbl
CROSS APPLY(SELECT CAST(tbl.data AS XML)) AS MyData(AsXml)
CROSS APPLY MyData.AsXml.nodes('*/@*') AS One(Attr)

The result

name  element      field    data
a     table_result  id      001
a     table_result  subj_cd cdaaa
a     table_result  grade   b
a     table_result  name    Phua Chu Kang
b     Another       test    test data
b     Another       test2   test2 data
c     OneMore       x       x data
c     OneMore       y       y data
c     OneMore       z       z data
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • This is the nice approach! However, how do we use XML node function with a variable? – Raf Feb 04 '16 at 03:22
  • @Cache, the first example shows this approach with a declared XML variable, the second example uses data stored in a table... In any case the variable / column must be a "real" XML and not just a VARCHAR containing XML. The last one is easy to cast... – Shnugo Feb 04 '16 at 08:36
  • I see. Thanks for the info! – Raf Feb 05 '16 at 08:29
0

Now, I'm not at all very good with T-SQL XML, but can't you just do it like this:

create table #dummy (name nvarchar(max), data xml);

insert into #dummy 
values ('a', '<table_result id="001" subj_cd="cdaaa" grade="b" name="phua chu kang"/>');

select 'id' "field",
    elem.value('@id', 'nvarchar(50)') "data"
from #dummy
cross apply data.nodes('/table_result') tbl(elem)

union all

select 'subj_cd' "field",
    elem.value('@subj_cd', 'nvarchar(50)') "data"
from #dummy
cross apply data.nodes('/table_result') tbl(elem)

union all

select 'grade' "field",
    elem.value('@grade', 'nvarchar(50)') "data"
from #dummy
cross apply data.nodes('/table_result') tbl(elem)

union all

select 'name' "field",
    elem.value('@name', 'nvarchar(50)') "data"
from #dummy
cross apply data.nodes('/table_result') tbl(elem);

Notice that I changed the data type for #dummy.data to be xml. That's required to be able to use the XML functions.

Bacon Bits
  • 30,782
  • 5
  • 59
  • 66
  • The OP stated, that a generic approach is needed (varying number and names of attributes)... As you can see in my answer, this is quite easy to achieve... – Shnugo Feb 03 '16 at 21:30
  • @Shnugo As I said, I'm not very good with T-SQL XML! If you know how to get the same results from a *table* instead of a variable without using the `CROSS APPLY` method I have above I'd like to see that, too. – Bacon Bits Feb 03 '16 at 21:41
  • Look at my answer, I edited it and added an example how to do the same with table data... – Shnugo Feb 03 '16 at 21:54
  • One more point: You state in your comment `without using CROSS APPLY`. Why? There is a general limitation in accessing attributes of the root element. `CROSS APPLY` with `.nodes()` allows you to get a relative reference to the node and allows you to read these attributes. There, in the XPath, you may use the `*` instead of actual names which allows generic queries. – Shnugo Feb 03 '16 at 22:21
  • @Shnugo Because I was curious if that was the only way or the best way. I don't have anything against the `CROSS APPLY`. 90% of XML answers on StackOverflow use XML saved in variables, and that's essentially useless unless you're writing a sproc to shred XML one fragment at a time because you can't use it with set-based logic. I also don't find the XML doc on MSDN very easy to read, so it's hard for me to use that to learn what I *can* do, nevermind what I *should*. – Bacon Bits Feb 04 '16 at 13:21
  • As long as we speak about "real" XML (no difference between variable and table column), using `CROSS APPLY xyz.nodes()` is perfectly `set based`... The nodes-method will return a list of rows quite the same way as you would join another table. But maybe I did not understand your point... – Shnugo Feb 04 '16 at 13:27
  • @Shnugo Say you've got a table with 500,000 records with an xml column and you want to shred it. Is it going to be better to CROSS APPLY, or is it going to be better to create a cursor so you can put each field into an xml variable and shredding xml one row at a time? The former I would term set based. The latter I would term procedural. Guess which method makes up the vast majority (easily 90%) of answers on SE, SQL blogs, and MSDN doc? Guess which method performs better? Any answer that starts out with a literal or implied `DECLARE @xml xml` is only telling half the story. – Bacon Bits Feb 04 '16 at 13:53
  • Yes, I get the point... If there is a table with 500.000 rows each containing XML data I'd question the design actually :-) To get this clear: Loops are **evil**, multi-statment-functions are **evil**, cursors are **very evil** and procedural thinking is **responsible for alle the evil** :-) `CROSS APPLY` used in the right way is `set based` and therefore **not evil**. And yes: 500.000 rows with XML to shred **are evil too** :-)) – Shnugo Feb 04 '16 at 14:05