My original issue is that I am trying to serialize a string containing html tags to an XML element.
hello <a href="world.php">World</a>, this
is
a nice
test
<ul>
<li>to demonstrate my issue</li>
<li>and find a solution</li>
</ul>
However, I have 2 issues
- Serializing HTML to XML: I did not succeed in defining the Serializable class to correctly serialize with XmlSerialze, so I decided that, using CDATA sections might be the better way. This is however not correctly deserialized by the target tool (that I have no influence on). What I need is plain and correct html (XHMTL?) within the xml output file.
2. The string looks e.g. as above, but is not fully correct html (no
<p>
tags, no <br>
tags).
Now I would like to replace the newlines by a p or br tag. I have had a look here and used the suggested solution:
string result = "<p>" + text
.Replace(Environment.NewLine + Environment.NewLine, "</p><p>")
.Replace(Environment.NewLine, "<br />")
.Replace("</p><p>", "</p>" + Environment.NewLine + "<p>") + "</p>";
However, this does not in all cases generate valid html. In the example above, it would create <br />
s between the <li>
tags or cause <ul>
tags within <p>
tags - which is both not allowed.
Target would be to have a result like the following (line breaks are only for better readability and don't matter here)
<p>hello <a href="world.php">World</a>, this</p>
<p>is<br/>
a nice<br/>
test<br/></p>
<ul>
<li>to demonstrate my issue</li>
<li>and find a solution</li>
</ul>
Do you have any suggestion how to solve this either with a string.Replace, Regex, or better solution (HtmlDocument)?
Please note: I have no influence on deserialization, the XML output is evaluated by I tool I have no influence on, and it has to be UTF-8 encoded.
Thank you!
EDIT: Clearly separated the 2 issues
EDIT2: No influence on deserialization
EDIT3: Added target output