0

I am dying trying to import a large xml file with tons of special characters that keep throwing errors in my app. I get through about 76 records in the XML file before something causes a Token error. {"There was an error parsing the query. [ Token line number = 1,Token line offset = 725,Token in error = S ]"}

I have put my code below.

public void ReadXMLIntoDB()
        {
            OpenFileDialog ofd;
            ofd = new OpenFileDialog();
            ofd.Filter = "Xml Files|*.xml|AllFiles|*.*";
            ofd.ShowDialog();

            StreamReader sr = new StreamReader(ofd.FileName);
            XmlTextReader xr = new XmlTextReader(sr);
            int iNumRows = 0;
            while (xr.ReadToFollowing("row") != false)
            {
                string sFacName, sFacID, sAddress, sCity, sState, sZip, sOwnerID, sOwnerName, sStreetNum, sStreetName, sStreetType, sPostDirectional, sPhone, sProgramCat, sProgramCatDes, sInspectionDate, sInspectionType, sInspectionDes, sSerialNum, sActionCode, sActionDes, sResultCode, sResultDes, sViolationCode, sViolationDes, sInspectionMemo;
                xr.ReadToFollowing("value");
                sFacName = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sFacID = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sAddress = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sCity = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sState = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sZip = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sOwnerID = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sOwnerName = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sStreetNum = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sStreetName = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sStreetType = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sPostDirectional = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sPhone = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sProgramCat = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sProgramCatDes = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sInspectionDate = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sInspectionType = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sInspectionDes = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sSerialNum = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sActionCode = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sActionDes = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sResultCode = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sResultDes = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sViolationCode = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sViolationDes = xr.ReadInnerXml();

                xr.ReadToFollowing("value");
                sInspectionMemo = xr.ReadInnerXml();
                Regex.Replace(sInspectionMemo, @"[^\w\&#.@-]", "");

                SqlCeConnection con;
                SqlCeCommand cmd;
                string cstr, sql;

                cstr = @"Data Source=|DataDirectory|\foodDB.sdf";
                sql = String.Format("Insert Into food(FacilityName,FacilityID,SiteAddress,City,State,ZipCode,OwnerID,OwnerName,StreetNumber,StreetName,StreetType,PostDirectional,Phone,ProgramCategory,ProgramCategoryDescription,InspectionDate,InspectionType,InspectionDescription,SerialNumber,ActionCode,ActionDescription,ResultCode,ResultDescription,ViolationCode,ViolationDescription,InspectionMemo) values ('{0}','{1}','{2}','{3}','{4}','{5}','{6}','{7}','{8}','{9}','{10}','{11}','{12}','{13}','{14}','{15}','{16}','{17}','{18}','{19}','{20}','{21}','{22}','{23}','{24}','{25}')", sFacName, sFacID, sAddress, sCity, sState, sZip, sOwnerID, sOwnerName, sStreetNum, sStreetName, sStreetType, sPostDirectional, sPhone, sProgramCat, sProgramCatDes, sInspectionDate, sInspectionType, sInspectionDes, sSerialNum, sActionCode, sActionDes, sResultCode, sResultDes, sViolationCode, sViolationDes, sInspectionMemo);
                con = new SqlCeConnection(cstr);
                cmd = new SqlCeCommand(sql, con);
                con.Open();
                cmd.ExecuteNonQuery();
                con.Close();

                ++iNumRows;

                Action ac = delegate()
                {
                    txrows.Text = iNumRows.ToString();
                };
                Dispatcher.BeginInvoke(ac);
            }

        }

        private void Button_Click_1(object sender, RoutedEventArgs e)
        {
            Thread t = new Thread(ReadXMLIntoDB);
            t.Start();
        }

From what I can tell, the only obvious thing I can see in that record that might cause an error is a 's where a ' should be. Can someone please help me?

  • 2
    If you don't have a text editor that can jump to line offset 725, run your XML through an XML formatter to add line breaks and indents. You'll then get a line number and offset with the error, and it will be much easier to view. – dthorpe Oct 23 '12 at 04:08
  • 1
    Some basic debugging advice. You are having trouble with the parsing of the xml. So get rid of everything else. Hard code the file name, remove the threading, and remove the database stuff. My guess is that every time you run this you have to manually select the file and then run it, then wait around for a bit, etc.. That is just wasting time that can be spent troubleshooting. And the threading will just make it harder to debug. Without the threads visual studio should break automatically on the error. You can always add it back in later. – nickles80 Oct 23 '12 at 04:48
  • 1
    unfortunately, the xml file I have is from the city website and is about 110Mb in size... i've attempted to run it through an online formatter, but I'm afraid it will take too long. Any free formatter that you can point me towards to download to run locally would be appreciated. – Matthew Garcia Oct 23 '12 at 04:58
  • 2
    Try notepad++. There are plugins that allow you to do this. http://stackoverflow.com/questions/3961217/how-to-format-xml-in-notepad – nickles80 Oct 23 '12 at 05:04
  • excellent tool! unfortunately when I look at the row that is in error, I see a red line to the left in notepad++ but it isn't obvious what is causing the error with that particular data in the XML document. I went line by line through the entire record and the syntax looks correct. – Matthew Garcia Oct 23 '12 at 05:19
  • nevermind, i'm an idiot and didn't realize the red line to the left is simply showing you which record you are currently looking at. The XML record that creates a break is no different in format or number of values than any of the other rows. I'm not sure where to go from here... – Matthew Garcia Oct 23 '12 at 05:29
  • 1
    Show us the XML that is causing the problem. Edit your question and paste a few lines of XML on either side of the error into the question. Someone here might be able to spot an issue. If the data contains personal information, redact the content by replacing a-z with *. Don't modify the XML tags or attribute names. Also, keep an eye out for symbols in the content like ">". If a ">" in the content is not properly escaped, it can make parsing the XML "very difficult". – dthorpe Oct 23 '12 at 07:15

2 Answers2

0

Your xml document may be not valid. Anyway you can set breakpoint that will occur on 76th iteration and check an error.

pS: Set breakpoint somewhere in while statement. Right click on breakpoint and select HitCount.

opewix
  • 4,993
  • 1
  • 20
  • 42
  • I didn't realize you can do that. I'm trying it now. Thank you! – Matthew Garcia Oct 23 '12 at 04:21
  • ok, so it is actually at 77, but as soon as I click on continue after the breakpoint pauses the progress, I get the same error parsing the query I had before. I'm extremely new to programming and working with XML, so I appreciate any help you can give me. – Matthew Garcia Oct 23 '12 at 04:24
  • Do not press continue, press F11. When you will get exception, click on `View details` hyperlink and look for `Inner exception`. There may be more detailed information about this error. – opewix Oct 23 '12 at 04:26
  • 1
    You said you have a large xml with tons of special characters. Try to change XmlTextReader's encoding http://stackoverflow.com/questions/961699/how-to-change-character-encoding-of-xmlreader – opewix Oct 23 '12 at 05:15
  • 1
    I tried that to no avail. Maybe some sleep and a fresh set of eyes will help me finish it up tomorrow. THank you for your help! – Matthew Garcia Oct 23 '12 at 05:49
0

I wrote an XML with XSD validator in C# a while ago that can easily be adapted to skip XSD and only check if the XML is at least well formed. If you have an XSD schema, you can also check if the XML is valid.

Well formed XML adheres to some XML rules
Valid XML is well formed XML that also satisfy a schema (XSD can be used to formulate such a schema)

The tool will show you in which position of the XML document it is not well formed or not valid.

The easiest to get this tool to work is just download the whole bo-library source code tree, then load the bo-Library.sln solution into Visual Studio, select the ValidateXmlWithXsd.csproj and go from there.

Jeroen Wiert Pluimers
  • 23,965
  • 9
  • 74
  • 154