0

I have a few XSD files that define a report, wondering if there's a way load this into a table that I can query.

The elements XSD defines the required fields, it looks like:

  <xs:simpleType name="NHSNumberStatusIndicatorCode_Withheld_Type">
    <xs:restriction base="ns:AlphaNumeric_Type">
        <xs:length value="2"/>
        <xs:enumeration value="01"/>
        <xs:enumeration value="02"/>
        <xs:enumeration value="03"/>
        <xs:enumeration value="04"/>
        <xs:enumeration value="05"/>
        <xs:enumeration value="06"/>
        <xs:enumeration value="07"/>
        <xs:enumeration value="08"/>
    </xs:restriction>
</xs:simpleType>

I need to convert it to

     Field_Name         Type                    Length        enumeration     MaxLen      MinLen
     NHSNumber...       AlphaNumeric_Type        2                 01
     NHSNumber...       AlphaNumeric_Type        2                 02
     NHSNumber...       AlphaNumeric_Type        2                 03

and so on

Some fields don't have enumeration, they might look like:

         <xs:simpleType name="CDSOrganisationIdentifier_Type">
    <xs:restriction base="ns:AlphaNumeric_Type">
        <xs:minLength value="3"/>
        <xs:maxLength value="5"/>
    </xs:restriction>
</xs:simpleType>

OR

   <xs:simpleType name="Number_Type">
    <xs:restriction base="xs:string">
        <xs:pattern value="[0-9]{10}"/>
    </xs:restriction>
</xs:simpleType>

This returns Null for all the fields: enter image description here

Ariox66
  • 620
  • 2
  • 9
  • 29

1 Answers1

1

You can load it into an XML variable and then use various SQL Server XML methods to select the nodes of interest and extract the values. See xml Data Type Methods and Path Expressions (XQuery) for more information.

In the following, I wrapped your XSD fragment in a "schema" element that also defined the "xs:" namespace prefix.

DECLARE @xml XML = '<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:simpleType name="NHSNumberStatusIndicatorCode_Withheld_Type">
    <xs:restriction base="ns:AlphaNumeric_Type">
        <xs:length value="2"/>
        <xs:enumeration value="01"/>
        <xs:enumeration value="02"/>
        <xs:enumeration value="03"/>
        <xs:enumeration value="04"/>
        <xs:enumeration value="05"/>
        <xs:enumeration value="06"/>
        <xs:enumeration value="07"/>
        <xs:enumeration value="08"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema >'

SELECT
    N.node.value('(../../@name)[1]', 'nvarchar(max)') AS Field_Name,
    N.node.value('(../@base)[1]', 'nvarchar(max)') AS Type,
    N.node.value('(../xs:length/@value)[1]', 'int') AS Length,
    N.node.value('@value[1]', 'nvarchar(max)') AS enumeration
FROM @xml.nodes('//xs:enumeration') N(node)

The .nodes() method selects all of the enumeration nodes from the supplied xml. The leading // selects nodes at any depth. The N(nodes) provide arbitrary table and column aliases to the selected results. The .value() method is then used to select attributes values from either the selected node, parent node, or sibling node in the case of Length. The @ prefix on a name refers to an attribute instead of a child node. Because many XPath selectors could potentially select more than one value, you often need to use [1] or (...)[1] to select just the first.

Result:

Field_Name Type Length enumeration
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 01
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 02
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 03
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 04
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 05
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 06
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 07
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 08

See this db<>fiddle.

The XSD can also be loaded into an XSD typed column of a table and accessed in a manner similar to the above by replacing @xml with the column reference.

The above is coded for this specific enumeration type scenario. If your actual XSD contains other type definitions that don't follow this pattern, you may need to handle those cases separately.

UPDATE: To accommodate the following additional cases from your amended post ...

  <xs:simpleType name="CDSOrganisationIdentifier_Type">
    <xs:restriction base="ns:AlphaNumeric_Type">
        <xs:minLength value="3"/>
        <xs:maxLength value="5"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="Number_Type">
    <xs:restriction base="xs:string">
        <xs:pattern value="[0-9]{10}"/>
    </xs:restriction>
  </xs:simpleType>

a different approach is needed. After an initial .nodes() call to get the <xs:simpleType ...> nodes, additional calls are made to access selected subordinate nodes, some of which may or may not exist. Finally, the .value() calls in the select list have been updated to pull data relative to one of the several selected nodes.

SELECT
    S.Node.value('(@name)[1]', 'nvarchar(max)') AS Field_Name,
    R.Node.value('(@base)[1]', 'nvarchar(max)') AS Type,
    R.Node.value('(xs:length/@value)[1]', 'int') AS Length,
    E.Node.value('@value[1]', 'nvarchar(max)') AS enumeration,
    R.Node.value('(xs:maxLength/@value)[1]', 'int') AS MaxLength,
    R.Node.value('(xs:minLength/@value)[1]', 'int') AS MinLength,
    R.Node.value('(xs:pattern/@value)[1]', 'nvarchar(max)') AS Pattern
FROM @xml.nodes('//xs:simpleType') S(Node)
OUTER APPLY S.Node.nodes('xs:restriction') R(Node) -- Likely always one occurrence
OUTER APPLY R.Node.nodes('xs:enumeration') E(Node) -- Zero or more occurrences

An OUTER APPLY is like a LEFT JOIN to a subselect, except that no ON condition used.

The result is:

Field_Name Type Length enumeration MaxLength MinLength Pattern
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 01 null null null
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 02 null null null
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 03 null null null
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 04 null null null
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 05 null null null
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 06 null null null
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 07 null null null
NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 08 null null null
CDSOrganisationIdentifier_Type ns:AlphaNumeric_Type null null 5 3 null
Number_Type xs:string null null null null [0-9]{10}

See this updated db<>fiddle.

T N
  • 4,322
  • 1
  • 5
  • 18
  • Great, thanks so much. Any idea How to add Type and Length? this is my first time dealing with XMLs and XSDs – Ariox66 Jan 23 '23 at 02:20
  • 1
    I revised my answer above with additional results and a more detailed explaination. It's worth learning XPath syntax so that you can code up similar cases, but I realized after my original post that the sibling lookup case for Length was a bit different from the rest. – T N Jan 23 '23 at 02:33
  • Sorry to bug you, I found out not all fields follow the same pattern. some don't have enumeration and weren't picked up. I updated the question. – Ariox66 Jan 23 '23 at 04:52
  • In the above example, what do you do if there are more than one and you want to get all of them (we don't know how many might be) – Ariox66 Jan 30 '23 at 01:26
  • Is there a way to say return all values instead of Value[1],Value[2], ..? – Ariox66 Jan 30 '23 at 01:27
  • 1
    Yes. That would likely be another OUTER APPLY using the .node() function similar to what was done with enumeration values. Just we aware that mapping a nested multi-dimensional XML structure into a single 2D rows and columns result may not be an easy task. You might needs to redesign your results across multiple tables, such as one for the general type information and scalar constraints, and others for multi-valued properties such as enumeration values, patterns, and more. Even then, the XSD may have nested and recursive representations that complicate things further. – T N Jan 30 '23 at 01:45
  • I am dealing with a similar situation, trying to flatten a structure file that has nested tags, there are multiple field groups, every group has few elements, some elements could be a group themselves. trying to flatten it all in one table getting null parent group names for elements are parent themselves. – Ariox66 Jan 30 '23 at 04:44