I have a XML file that I'm shredding into a table in SQL Server 2008. It goes several nodes deep and this is a first attempt for me to do this so I'm sure I'm missing the point somewhere.
I got it to work, but for a XML with 97 records I get each record 97 times, i.e. a total of 9409 rows in the result set! That takes about 37 seconds. If I use SELECT DISTINCT
I get the 97 rows but it (predictably) also takes 37 seconds.
First row of the XML:
<?xml version="1.0" encoding="utf-8"?>
<trmFileDataStream>
<getTransactionDetails>
<getTransactionDetailsResponse>
<messages>
<resultCode>Ok</resultCode>
<message>
<code>I00001</code>
<text>Successful.</text>
</message>
</messages>
<transaction>
<transId>4570599999</transId>
<submitTimeUTC>2012-08-12T20:52:05.01Z</submitTimeUTC>
<submitTimeLocal>2012-08-12T15:52:05.01</submitTimeLocal>
<transactionType>authCaptureTransaction</transactionType>
<transactionStatus>pendingSettlement</transactionStatus>
<responseCode>1</responseCode>
<responseReasonCode>1</responseReasonCode>
<responseReasonDescription>Approval</responseReasonDescription>
<AVSResponse>P</AVSResponse>
<batch>
<batchId>2007999999</batchId>
<settlementTimeUTC>2012-08-12T21:12:40.193Z</settlementTimeUTC>
<settlementTimeLocal>2012-08-12T16:12:40.193</settlementTimeLocal>
<settlementState>pendingSettlement</settlementState>
</batch>
<authAmount>99.04</authAmount>
<settleAmount>99.04</settleAmount>
<taxExempt>false</taxExempt>
<payment>
<bankAccount>
<routingNumber>XXXXCCCC</routingNumber>
<accountNumber>XXXXNNNN</accountNumber>
<nameOnAccount>Account Name</nameOnAccount>
<echeckType>WEB</echeckType>
</bankAccount>
</payment>
<customer>
<id>UWYN201H7C</id>
</customer>
<billTo>
<firstName>FirstName</firstName>
<lastName>LastName</lastName>
<address>123245 Some street.</address>
<city>Some City</city>
<state>WS</state>
<zip>36123</zip>
<country>USA</country>
<phoneNumber>1234567891</phoneNumber>
</billTo>
<recurringBilling>false</recurringBilling>
</transaction>
</getTransactionDetailsResponse>
My SELECT
statement:
SELECT distinct
transactions.value ('(transId/text())[1]','varchar(100)') AS transID,
Replace(Replace(transactions.value ('(submitTimeUTC/text())[1]','varchar(100)'),'T',' '),'Z',' ') AS submitTimeUTC,
Replace(transactions.value ('(submitTimeLocal/text())[1]','varchar(100)'),'T',' ') AS submitTimeLocal,
transactions.value ('(transactionType/text())[1]','varchar(100)') AS transactionType,
transactions.value ('(transactionStatus/text())[1]','varchar(100)') AS transactionStatus,
transactions.value ('(responseCode/text())[1]','varchar(100)') AS responseCode,
transactions.value ('(responseReasonCode/text())[1]','varchar(100)') AS responseReasonCode,
transactions.value ('(responseReasonDescription/text())[1]','varchar(100)') AS responseReasonDescription,
transactions.value ('(AVSResponse/text())[1]','varchar(100)') AS AVSResponse,
transactions.value ('(authAmount/text())[1]','decimal(10,2)') AS authAmount,
transactions.value ('(settleAmount/text())[1]','decimal(10,2)') AS settleAmount,
transactions.value ('(taxExempt/text())[1]','varchar(100)') AS taxExempt,
transactions.value ('(recurringBilling/text())[1]','varchar(100)') AS recurringBilling,
rootb.value ('(fileInformationLine/text())[1]','varchar(100)') AS FileInfo,
messagesb.value ('(resultCode/text())[1]','varchar(100)') AS resultCode,
messageb.value ('(code/text())[1]','varchar(100)') AS MsgCode,
messageb.value ('(text/text())[1]','varchar(100)') AS MsgText,
batch.value ('(batchId/text())[1]','varchar(100)') AS batchID,
batch.value ('(settlementTimeUTC/text())[1]','varchar(100)') AS settlementTimeUTC,
batch.value ('(settlementTimeLocal/text())[1]','varchar(100)') AS settlementTimeLocal,
batch.value ('(settlementState/text())[1]','varchar(100)') AS settlementState,
bankacc.value ('(routingNumber/text())[1]','varchar(100)') AS routingNumber,
bankacc.value ('(accountNumber/text())[1]','varchar(100)') AS accountNumber,
bankacc.value ('(nameOnAccount/text())[1]','varchar(100)') AS nameOnAccount,
bankacc.value ('(echeckType/text())[1]','varchar(100)') AS echeckType,
Customer.value ('(id/text())[1]','varchar(100)') AS customerID,
billTo.value ('(firstName/text())[1]','varchar(100)') AS firstName,
billTo.value ('(lastName/text())[1]','varchar(100)') AS lastName,
billTo.value ('(address/text())[1]','varchar(100)') AS address,
billTo.value ('(city/text())[1]','varchar(100)') AS city,
billTo.value ('(state/text())[1]','varchar(100)') AS state,
billTo.value ('(zip/text())[1]','varchar(100)') AS zip,
billTo.value ('(country/text())[1]','varchar(100)') AS country,
billTo.value ('(phoneNumber/text())[1]','varchar(100)') AS phoneNumber
FROM
xmlImportTempTable
/* Message branch */
CROSS APPLY
xml_data.nodes('//trmFileDataStream/getTransactionDetails/getTransactionDetailsResponse') AS tMsg(getTD)
OUTER APPLY getTD.nodes('messages') AS getTD(messagesb)
OUTER APPLY messagesb.nodes('message') AS messagesb(messageb)
/* Transaction branches */
CROSS APPLY xml_data.nodes('//trmFileDataStream') AS Txn(rootb)
OUTER APPLY rootb.nodes('getTransactionDetails') AS rootb(getTransDetl)
OUTER APPLY getTransDetl.nodes('getTransactionDetailsResponse') AS rootc(getTransDtlResp)
OUTER APPLY getTransDtlResp.nodes('transaction') AS rootd(transactions)
OUTER APPLY transactions.nodes('batch') AS btc(batch)
OUTER APPLY transactions.nodes('payment') AS pmt(payment)
OUTER APPLY payment.nodes('bankAccount') AS bacc(bankacc)
OUTER APPLY transactions.nodes('customer') AS cust(customer)
OUTER APPLY transactions.nodes('billTo') AS billing(billTo)
ORDER BY transID
I CAN live with the performance (this will be ongoing imports and the xml files can be 6-7 times bigger than this one ) since I can extract the correct result set. But I would really like to be able to figure out exactly how to do this right! I just read here (http://stackoverflow.com/questions/61233/the-best-way-to-shred-xml-data-into-sql-server-database-columns) that adding the schema would help tremendously with the performance and I will try that tomorrow. The multiple rows are the perplexing issue for me.
Many thanks in advance:)
Arnor Baldvinsson, Icetips Alta LLC