1

I have a SQL Server 2014 database that stores 2 million XML files. The XML file looks like:

<?xml version='1.0' encoding='UTF-16'?>
<PROJECTS xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
    <row>
        <APPLICATION_ID>7975883</APPLICATION_ID>
        <ACTIVITY>N01</ACTIVITY>
        <ADMINISTERING_IC>HL</ADMINISTERING_IC>
        <APPLICATION_TYPE xsi:nil="true"/>
        <ARRA_FUNDED>N</ARRA_FUNDED>
        <PIS>
           <PI>
              <PI_NAME>MICHEL, MARY Q</PI_NAME>
              <PI_ID>3704353</PI_ID>
           </PI>
           <PI>
              <PI_NAME>SMITH, ROBERT B</PI_NAME>
              <PI_ID>3704354</PI_ID>
           </PI>
           <PI>
               <PI_NAME>DOE, JOHN A</PI_NAME>
               <PI_ID>3704353</PI_ID>
            </PI>
        </PIS>
        <ORG_DUNS>600044978</ORG_DUNS>
        <ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
        <ORG_DISTRICT>08</ORG_DISTRICT>
        <ORG_ZIPCODE>208523003</ORG_ZIPCODE>
    </row>
</PROJECTS> 

My problem is that I want to pull all of the PI values based upon the ORG_DUNS numbers in a stored procedure. So the code that I have is:

SELECT 
    APPLICATION_ID,     
    nref.value('.','varchar(max)') TERM 
INTO 
    ADMIN_MUSC_RePORTER_TERMS                   
FROM 
    [ADMIN_Grant_Exporter_Files_XML] 
CROSS APPLY  
    XMLData.nodes('//PIS/PI') AS R(nref)
WHERE
    RECID = 1

And that work fine when I use a WHERE cause based up another field in the database but if I need to reference a node in the xml file that's where I'm having a problem. I need to pull all the the XML files that have ORG_DUNS equal to 600044978 and I know that the nref.value('ORG_DUNS[1]', 'varchar(max)') does not exist because of the cross apply.

SELECT 
    APPLICATION_ID,     
    nref.value('.','varchar(max)') TERM 
INTO 
    ADMIN_MUSC_RePORTER_TERMS                   
FROM 
    [ADMIN_Grant_Exporter_Files_XML] 
CROSS APPLY
    XMLData.nodes('//PIS/PI') as R(nref)
WHERE
    nref.value('ORG_DUNS[1]', 'varchar(max)') = '600044978'

So how can I get all of the PI nodes using the ORG_DUNS as my WHERE? Thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Bill
  • 1,423
  • 2
  • 27
  • 51

2 Answers2

2

Change your Cross Apply statement to include the filter logic in the XPath:

CROSS APPLY XMLData.nodes('//PIS[../ORG_DUNS/text() = ''600044978'']/PI') AS R(nref)

To explain, //PIS[../ORG_DUNS/text() = ''600044978'']/PI says:

  • //PIS - find all elements called PIS
  • [...] - filter the returned elements for those matching this condition
  • ../ORG_DUNS/text() = ''600044978'' - Condition: the PIS's parent element's ORG_DUNS's text value equals 600044978.
  • Then return the child PI elements of any matching PIS elements.

Update per comments

Full SQL, including PI and PI_ID as separate values:

SELECT 
    APPLICATION_ID     
    , nref.value('(./PI_NAME/text())[1]','varchar(max)') PI 
    , nref.value('(./PI_ID/text())[1]','varchar(max)') PI_ID
INTO 
    ADMIN_MUSC_RePORTER_TERMS                   
FROM 
    [ADMIN_Grant_Exporter_Files_XML] 
CROSS APPLY 
    XMLData.nodes('//PIS[../ORG_DUNS/text() = ''600044978'']/PI') AS R(nref)
WHERE
    RECID = 1

Notes:

  • ./PI_NAME - from the currently selected element (i.e. the one refered to by the nref column; which is pointing at a PI element), take its child element, PI_NAME.
  • /text() - from the PI_NAME element, take its child text element (strictly this is not required, since when pulling back the value & converting to an varchar we'd get the same result; but I like to be explicit).
  • (...)[1] - return a singleton. i.e. we only want 1 value back, even if there were multiple PI_NAME elements under the current PI. By putting brackets around our expression we're saying "for all values returned by this expression"; and [1] says take the first result (since XPATH uses one-based indexes rather than zero-based as most other languages would).

Filtering with a variable

Anticipating your next issue; i.e. "How do you change the number at runtime without building dynamic SQL?", the answer's to use the sql:variable function:

declare @DunningNumber int = 600044978 --9 digit code http://www.dnb.com/duns-number.html; so can easily hold in an int: https://learn.microsoft.com/en-us/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql

SELECT 
    APPLICATION_ID     
    , nref.value('(./PI_NAME/text())[1]','varchar(max)') PI 
    , nref.value('(./PI_ID/text())[1]','varchar(max)') PI_ID
INTO 
    ADMIN_MUSC_RePORTER_TERMS                   
FROM 
    [ADMIN_Grant_Exporter_Files_XML] 
CROSS APPLY 
    XMLData.nodes('//PIS[../ORG_DUNS/text() = sql:variable("@DunningNumber")]/PI') AS R(nref)
WHERE
    RECID = 1
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
  • 1
    This works great and is fast but if I need to change my select statement so I get both the PI and the PI_ ID as seperate values like SELECT APPLICATION_ID, nref.value('.','varchar(max)') PI, nref.value('.','varchar(max)') PI_ID. How can I do it?. I know that was not in the original question but I screwed up – Bill Jun 10 '17 at 14:10
  • No worries; almost exactly what you've got already, only tweaked slightly: `nref.value('(.\PI_ID\text())[1]','varchar(max)') TermId, nref.value('(.\Name\text())[1]','varchar(max)') Term` – JohnLBevan Jun 10 '17 at 14:49
  • 1
    Yeah yeah- that was my next question(Smile) but here is one more how about searching for a PI's Name with only a portion of the node. Such as the node is "Alberg, Albert J" and I want to search for "ALBERG" – Bill Jun 10 '17 at 15:43
  • `Contains`: https://stackoverflow.com/a/12488839/361842 \ https://learn.microsoft.com/en-us/sql/xquery/functions-on-string-values-contains. `XMLData.nodes('//PIS[../ORG_DUNS/text() = sql:variable("@DunningNumber")]/PI[contains(./PI_NAME/text(), sql:variable("@PartialName")]')` – JohnLBevan Jun 10 '17 at 16:03
  • 1
    Thank you very much- this will help the researchers at the Cancer Center at the Medical University of SC-- thanks again – Bill Jun 10 '17 at 16:07
  • Sorry still need a little help-- I'm getting a syntax error and don't know how to tweak it. XQuery [ADMIN_Grant_Exporter_Files_XML.XMLData.nodes()]: ',' or ')' expected – Bill Jun 10 '17 at 16:22
  • I think that's my mistake, sorry; looks like I missed a close parenthesis above: `XMLData.nodes('//PIS[../ORG_DUNS/text() = sql:variable("@DunningNumber")]/PI[contains(./PI_NAME/text()‌​, sql:variable("@PartialName"))]')` – JohnLBevan Jun 10 '17 at 16:24
  • 1
    Thanks for all of your help – Bill Jun 10 '17 at 16:40
1

The trick here is to first grab multiple levels in the document using multiple CROSS APPLY clauses. So first, starting from the root grab all the '/PROJECTS/row' and then use a relative path from each of those 'PIS/PI'.

Like this:

declare @t table(id int identity, APPLICATION_ID int default (2), XmlData xml)

insert into @t(XmlData) values (
'<PROJECTS xmlns:xsi=''http://www.w3.org/2001/XMLSchema-instance''>
<row>
<APPLICATION_ID>7975883</APPLICATION_ID>
<ACTIVITY>N01</ACTIVITY>
<ADMINISTERING_IC>HL</ADMINISTERING_IC>
<APPLICATION_TYPE xsi:nil="true"/>
<ARRA_FUNDED>N</ARRA_FUNDED>
<PIS>
      <PI>
        <PI_NAME>MICHEL, MARY Q</PI_NAME>
        <PI_ID>3704353</PI_ID>
      </PI>
<PI>
        <PI_NAME>SMITH, ROBERT B</PI_NAME>
        <PI_ID>3704354</PI_ID>
      </PI>
<PI>
        <PI_NAME>DOE, JOHN A</PI_NAME>
        <PI_ID>3704353</PI_ID>
      </PI>

</PIS>
<ORG_DUNS>600044978</ORG_DUNS>
<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
<ORG_DISTRICT>08</ORG_DISTRICT>
<ORG_ZIPCODE>208523003</ORG_ZIPCODE>
</row>
</PROJECTS> '),(
'<PROJECTS xmlns:xsi=''http://www.w3.org/2001/XMLSchema-instance''>
<row>
<APPLICATION_ID>7975883</APPLICATION_ID>
<ACTIVITY>N01</ACTIVITY>
<ADMINISTERING_IC>HL</ADMINISTERING_IC>
<APPLICATION_TYPE xsi:nil="true"/>
<ARRA_FUNDED>N</ARRA_FUNDED>
<PIS>
      <PI>
        <PI_NAME>MICHEL, MARY Q</PI_NAME>
        <PI_ID>3704353</PI_ID>
      </PI>
<PI>
        <PI_NAME>SMITH, ROBERT B</PI_NAME>
        <PI_ID>3704354</PI_ID>
      </PI>
<PI>
        <PI_NAME>DOE, JOHN A</PI_NAME>
        <PI_ID>3704353</PI_ID>
      </PI>

</PIS>
<ORG_DUNS>600044979</ORG_DUNS>
<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
<ORG_DISTRICT>08</ORG_DISTRICT>
<ORG_ZIPCODE>208523003</ORG_ZIPCODE>
</row>
</PROJECTS> ')

select APPLICATION_ID,      
pinode.value('PI_NAME[1]','varchar(max)') PI_NAME,
pinode.value('PI_ID[1]','varchar(max)') PI_ID           
FROM @t 
cross apply XMLData.nodes('/PROJECTS/row') as r(rownode)
cross apply rownode.nodes('PIS/PI') as p(pinode)
where rownode.value('ORG_DUNS[1]','varchar(max)') = '600044978'
David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67