Suppose I have an XML document that looks something like (basically represents an HTML report):
<html>
<head>...</head>
<body>
<div>
<table>
<tr>
<td>Stuff</td>
</tr>
<tr>
<td>More stuff<br /><br />More stuff on another line and some whitespace... </td>
</tr>
<tr>
<td> Some leading whitespace before this stuff<br />Stuff</td>
</tr>
</table>
</div>
</body>
</html>
I want to (using C#) convert this document into a simple text string that looks something like:
Stuff
More stuff
More stuff on another line and some whitespace...
Some leading whitespace before this stuff
Stuff
It should be smart enough to convert table rows into new lines and insert new lines where any inline br tags were added within a cell. It should also keep any whitespace in the table cells intact. I tried using the XmlDocument class and used the InnerText method on the body node, but it doesn't seem to create the output I am looking for (newlines and whitespace are not intact). Is there a simple way to do this? I know one way to do this would be to extract the HTML as one string and do several regular expressions on it to handle the newlines and whitespace. Thanks!