I have an XML with invalid hexadecimal characters. I've read this, this and this and any other links given but failed to make it work.
I'm using XmlReader
- XmlDocument
, XDocument
and XmlTextReader
are not my options, because there are XML files with more than 500GB size and 500 million in volume. XMLReader is my best choice because of its "forward" approach, and not loading into the memory all of the XML details. Also, because of this, I can't have the XML file recreated or loaded just to replace the invalid characters.
Here's the code that I'm working on:
case XmlNodeType.Element:
if (xmlReader.Name.Equals("ROW"))
{
DataRow dataRow = xmlDataTable.NewRow();
XmlReader row = XmlReader.Create(xmlReader.ReadSubtree(), new XmlReaderSettings { CheckCharacters = false
, ValidationType = ValidationType.None });
// iterate on elements inside ROW
// these are the column items
if (row != null)
{
while (row.Read())
{
if (row.IsStartElement())
{
if (!row.Name.Equals("ROW"))
{
string columnName = row.Name;
//row = XmlReader.Create(CleanInvalidXmlChars(row.ReadInnerXml()));
row.Read();
string value = CleanInvalidXmlChars(row.Value.ToString());
// all other logics ...
The exception raises on the row.Read();
statement. Here's a sample XML file I'm reading:
<?xml version="1.0" encoding="UTF-8"?>
<MFAINSBRP>
<ROW>
<INSTITUTION_CODE>828 </INSTITUTION_CODE>
<BRANCH_CODE>GJ102</BRANCH_CODE>
<BRANCH_NAME> </BRANCH_NAME>
<BRANCH_NAME_FRENCH> </BRANCH_NAME_FRENCH>
<LANGUAGE_CODE>E</LANGUAGE_CODE>
<ADDR_NO>815412</ADDR_NO>
<FAX_AREA>0</FAX_AREA>
<FAX_PHONE>0</FAX_PHONE>
<AREA_CODE>0</AREA_CODE>
<PHONE_NO>0</PHONE_NO>
<STATUS>A</STATUS>
<PHONE_EXT>0</PHONE_EXT>
</ROW>
<!--ALL OTHER RECORDS-->
</MFAINSBRP>
Right now, I'm stuck on making this work.
EDIT:
The sample XML file is the record that makes my code break. I copied at pasted it here from Notepad++ but it doesn't show the invalid characters. Here's the image of how it looks in Notepad++:
How I create the xmlReader
object is just this simple statement:
using (xmlReader = XmlReader.Create(filePath, new XmlReaderSettings { CheckCharacters = false }))