0

I am trying to convert below xml containing html code to json using Newtonsoft json in c sharp,

  <Content>
   <richtext> <![CDATA[<p> <strong>This is sample richtext content </strong> </p> ]]</richtext>
   <htmlcontent><![CDATA[ <p> <strong>This is html content </strong> ]]</p> </htmlcontent>
   <others> sample </others>
  </Content>

My C# code is

string xmlContent = @"<Content><richtext><![CDATA[ <p> <strong>This is sample richtext content </strong> </p> ]]></richtext><htmlcontent> <![CDATA[<p> <strong>This is html content </strong> </p> ]]></htmlcontent><others> sample </others></Content>";
doc.LoadXml(xmlContent);
string jsonText = JsonConvert.SerializeXmlNode(doc, Newtonsoft.Json.Formatting.Indented);
Console.WriteLine("JSON is :" + jsonText);

My output is

    {
  "Content": {
    "richtext": {
      "#cdata-section": " <p> <strong>This is sample richtext content </strong> </p> "
    },
    "htmlcontent": {
      "#cdata-section": "<p> <strong>This is html content </strong> </p> "
    },
    "others": " sample "
  }
}

My expected output is

{
  "Content": {
    "richtext": "<p> <strong>This is sample richtext content </strong> </p>",
    "htmlcontent": "<p> <strong>This is html content </strong> </p>",
    "others": " sample "
  }
}

Is there any way to remove the #cdata-section element in XML during JSON conversion.

  • 1
    Your XML is invalid. In that XML `

    ` isn't HTML it's an xml node `p`. If you want to embed HTML into XML you need to use CData

    – Liam Nov 02 '17 at 11:58
  • 1
    Possible duplicate of [How do I store html page in a xml file?](https://stackoverflow.com/questions/5978316/how-do-i-store-html-page-in-a-xml-file) – Liam Nov 02 '17 at 11:59
  • Thanks Liam, i verified adding the HTML content in CDATA, still i get "#cdata-section" node which is not required in output. Can you tell how to remove this in the output. – user3113876 Nov 02 '17 at 13:08

1 Answers1

7

Remove CDATA nodes from document. Paste the HTML as the raw data - it will be inserted with escaping the tags.

Let's use Linq2Xml instead of XmlDocument. It more convenient.

string xmlContent = @"<Content><richtext><![CDATA[ <p> <strong>This is sample richtext content </strong> </p> ]]></richtext><htmlcontent> <![CDATA[<p> <strong>This is html content </strong> </p> ]]></htmlcontent><others> sample </others></Content>";
var doc = XElement.Parse(xmlContent);

var cdata = doc.DescendantNodes().OfType<XCData>().ToList();
foreach(var cd in cdata)
{
    cd.Parent.Add(cd.Value);
    cd.Remove();
}

Console.WriteLine(doc);

string jsonText = JsonConvert.SerializeXNode(doc, Newtonsoft.Json.Formatting.Indented);
Console.WriteLine(jsonText);
Alexander Petrov
  • 13,457
  • 2
  • 20
  • 49
  • Thank you very much @Alexander Petrov & Liam for your responses, it helped me. – user3113876 Nov 02 '17 at 14:02
  • 1
    one small issue that i am facing after serilazation , it introduces \n characters in the output if my cdata contains content in different lines.can you tell me how to resolve this issue. – user3113876 Nov 09 '17 at 14:03
  • @user3113876 - Don't ask in comments. Edit your question or ask new one. Show us sample content with different lines. – Alexander Petrov Nov 09 '17 at 16:37