I have this xml:
<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.2/" version="2.0">
<channel>
<wp:wxr_version>1.2</wp:wxr_version>
<item>
<title type="html">
<![CDATA[ <h1 class="title">“Title with special character”</h1> ]]>
</title>
<content:encoded type="html">
<![CDATA[ <div class="content clearfix">
<p>Content Example Text</p>
</div> ]]>
</content:encoded>
<wp:post_id>0</wp:post_id>
<wp:post_date>2000-09-30T10:22:00.001Z</wp:post_date>
</item>
</channel>
</rss>
Inside the html title tag there is the unicode character: U+0007
Why is the xml invalid?
I'm using CDATA, is this not supose to make it valid?
What can I do to validate which symbols are invalid and remove them before constructing the xml?