0

I have the following problem. I need to write an application in C# that will read a given XML and prepare data for me to load into the database. In XML, the structure of which I have no influence, the main data is placed in CDATA. I have verified that the structure of this data is ordered in the correct XML structure.

I've searched hundreds of posts and can't find any solution from them. Below is the XML file from which I need to extract the data from the CDATA section. Maybe one of you can help me?

<Docs>
  <Doc>
  <Content>
    <![CDATA[
      <Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <Header DocNumber="1" Description="Desc1"></Header>
        <Poss>
          <Pos Id="1" Name="Pos1"></Pos>
          <Pos Id="2" Name="Pos2"></Pos>
        </Poss>
      </Doc>
    ]]>
  </Content>
  </Doc>
  <Doc>
  <Content>
    <![CDATA[
      <Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <Header DocNumber="2" Description="Desc2"></Header>
        <Poss>
          <Pos Id="3" Name="Pos3"></Pos>
          <Pos Id="4" Name="Pos4"></Pos>
        </Poss> 
      </Doc>
    ]]>
  </Content>
  </Doc>
</Docs>

For me, the most important are the fields contained in the Content section and I have to load them as data into the database.

Yong Shun
  • 35,286
  • 4
  • 24
  • 46
jeffers
  • 145
  • 3
  • 13
  • Does this answer your question? [How does one parse XML files?](https://stackoverflow.com/questions/55828/how-does-one-parse-xml-files) – user2414092 Jan 23 '23 at 13:37
  • I understand that I have to parse text from CDATA to xml right? – jeffers Jan 23 '23 at 14:11
  • 1
    Is your data in CDATA a valid XML? Cause I see the `` doesn't have closing tag. – Yong Shun Jan 23 '23 at 14:31
  • Can you call a stored procedure (SP) from the c#? It will pass your XML as a parameter to the stored procedure. The SP with shred that XML parameter, and insert data into an appropriate target table. – Yitzhak Khabinsky Jan 23 '23 at 16:57
  • By the way, the XML sample is not well-formed. The `` XML element doesn't have a closing tag. It needs to be fixed. – Yitzhak Khabinsky Jan 23 '23 at 17:09
  • How are you currently parsing the XML? With `XmlSerializer`, LINQ to XML or something else? – dbc Jan 23 '23 at 17:14

2 Answers2

2

To extract the data from the CData part,

  1. Construct the classes.
public class Doc
{
    public Header Header { get; set; }
    
    [XmlArrayItem(typeof(Pos), ElementName = "Pos")]
    public List<Pos> Poss { get; set; }
}

public class Header
{
    [XmlAttribute]
    public int DocNumber { get; set; }
    
    [XmlAttribute]
    public string Description { get; set; }
}

public class Pos
{
    [XmlAttribute]
    public int Id { get; set; }
    
    [XmlAttribute]
    public string Name { get; set; }
}
  1. Implement the extraction logic.

    2.1. Read the XML string as XDocument via XDocument.Parse().

    2.2. Select the DescendantNodes for the XPath of "/Docs/Doc/Content".

    2.3. Convert the nodes to XCData type.

    2.4. With XmlSerializer to deserialize the value of XCData to Doc type.

using System.Linq;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Xml.Serialization;
using System.IO;

XmlSerializer xmlSerializer = new XmlSerializer(typeof(Doc)); 
XDocument xDoc = XDocument.Parse(xml);
        
var cdataSections = xDoc.XPathSelectElements("/Docs/Doc/Content")
    .DescendantNodes()
    .OfType<XCData>()
    .Select(x => (Doc)xmlSerializer.Deserialize(new StringReader(x.Value)))
    .ToList();

Demo @ .NET Fiddle

Yong Shun
  • 35,286
  • 4
  • 24
  • 46
1

Here is implementation based on a stored procedure with the XML parameter like in my comments.

I had to remove the <Poss> XML element to make CData section a well-formed XML.

SQL

DECLARE @xml XML = 
N'<Docs>
    <Doc>
        <Content><![CDATA[
      <Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <Header DocNumber="1" Description="Desc1"></Header>
          <Pos Id="1" Name="Pos1"></Pos>
          <Pos Id="2" Name="Pos2"></Pos>
      </Doc>
    ]]>
        </Content>
    </Doc>
    <Doc>
        <Content><![CDATA[
      <Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <Header DocNumber="2" Description="Desc2"></Header>
          <Pos Id="3" Name="Pos3"></Pos>
          <Pos Id="4" Name="Pos4"></Pos>
      </Doc>
    ]]>
        </Content>
    </Doc>
</Docs>';

--INSERT INTO <targetTable>
SELECT h.value('(Header/@DocNumber)[1]', 'INT') AS DocNumber
    , h.value('(Header/@Description)[1]', 'VARCHAR(256)') AS DocDescription
    , d.value('@Id', 'INT') AS posID
    , d.value('@Name', 'VARCHAR(256)') AS posName
FROM @xml.nodes('/Docs/Doc/Content/text()') AS t(c)
CROSS APPLY (SELECT TRY_CAST(c.query('.').value('.', 'NVARCHAR(MAX)') AS XML)) AS t1(x)
CROSS APPLY x.nodes('/Doc') AS t2(h)
CROSS APPLY h.nodes('Pos') AS t3(d);

Output

DocNumber DocDescription posID posName
2 Desc2 3 Pos3
2 Desc2 4 Pos4
1 Desc1 1 Pos1
1 Desc1 2 Pos2
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21