This is a continuation of another question. I'm getting this error when I try to parse my xml file.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 12; Content is not allowed in trailing section.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$TrailingMiscDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at convert.ExcelXmlReader.getAndParseFile(ExcelXmlReader.java:55)
at convert.ExcelXmlReader.main(ExcelXmlReader.java:24)
The "lineNumber: 68; columnNumber: 12;" part matches up with the very last '>' in my xml file. When I try to delete the empty space after it, it still gives me the error. I tried to throw it into a xml validator, but it didn't come up with anything. I'm just really not sure about what I'm doing. I tried some other solutions from other stack overflow questions (looking through my file to find any weird characters after the xml file, making sure all the tags are closed) but none of them worked for me.
Does anybody have any hints where I should go now? Which would be the best direction to head?
<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
<Cell><Data ss:Type="String">WR Status</Data></Cell>
<Cell><Data ss:Type="String">Request Plant</Data></Cell>
<Cell><Data ss:Type="String">Request #</Data></Cell>
<Cell><Data ss:Type="String">Item#</Data></Cell>
<Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
<Cell><Data ss:Type="String">WR Description</Data></Cell>
<Cell><Data ss:Type="String">W/O No</Data></Cell>
<Cell><Data ss:Type="String">Charge Plant</Data></Cell>
<Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
<Cell><Data ss:Type="String">Equip NO</Data></Cell>
<Cell><Data ss:Type="String">Equipment Name</Data></Cell>
<Cell><Data ss:Type="String">Required Date</Data></Cell>
<Cell><Data ss:Type="String">WO Type</Data></Cell>
<Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
<Cell><Data ss:Type="String">Exec. Plant</Data></Cell>
<Cell><Data ss:Type="String">Plant1</Data></Cell>
<Cell><Data ss:Type="String">Area</Data></Cell>
<Cell><Data ss:Type="String">Confirmed</Data></Cell>
<Cell><Data ss:Type="String">WO Status</Data></Cell>
<Cell><Data ss:Type="String">W/R Requester</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
Current Code for the parsing. Most of the other code is in the previous question linked above.
private static void getAndParseFile() throws Exception {
System.out.println("getAndParseFile");
String fileName="C:\\Users\\windowsUserName\\Downloads\\F7BAH1P_List.xml";
File file = new File(fileName);
removeLineFromFile(file.getAbsolutePath());
System.out.println("Finished Removing Lines");
String fileContent = IOUtils.toString(new FileInputStream(file));
fileContent = fileContent.substring(0, fileContent.lastIndexOf('>')+1);
fileContent = fileContent.replaceAll("&#","");
PrintWriter pw = null;
pw = new PrintWriter(new FileWriter("C:\\Users\\windowsUserName\\Downloads\\tempfile.txt"));
pw.println(fileContent);
pw.flush();
ByteArrayInputStream bis = new ByteArrayInputStream(Charset.forName("UTF-16").encode(fileContent).array());
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
SAXHandler handler = new SAXHandler();
parser.parse(bis, handler);
}
The RemoveLineFromFile removes 2 <row></row>
from the beginning and from the end of the xml file that are blank or contain some counter/title data.
private static void removeLineFromFile(String file) {
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return;
}
br = new BufferedReader(new FileReader(file));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".tmp");
pw = new PrintWriter(new FileWriter(tempFile));
line = null;
br.close();
br = null;
br = new BufferedReader(new FileReader(file));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
//Delete the original file
if (!inFile.delete()) {
System.out.println("Could not delete original file");
return;
}
//Rename the new file to the filename the original file had.
if (!tempFile.renameTo(inFile))
System.out.println("Could not rename temp file");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
Here is the xml file before going through "removelinefromfile"
<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">List for Work Request(F7BAH1P)</Data></Cell>
</Row>
<Row>
</Row>
<Row>
<Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
<Cell><Data ss:Type="String">WR Status</Data></Cell>
<Cell><Data ss:Type="String">Request Plant</Data></Cell>
<Cell><Data ss:Type="String">Request #</Data></Cell>
<Cell><Data ss:Type="String">Item#</Data></Cell>
<Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
<Cell><Data ss:Type="String">WR Description</Data></Cell>
<Cell><Data ss:Type="String">W/O No</Data></Cell>
<Cell><Data ss:Type="String">Charge Plant</Data></Cell>
<Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
<Cell><Data ss:Type="String">Equip NO</Data></Cell>
<Cell><Data ss:Type="String">Equipment Name</Data></Cell>
<Cell><Data ss:Type="String">Required Date</Data></Cell>
<Cell><Data ss:Type="String">WO Type</Data></Cell>
<Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
<Cell><Data ss:Type="String">Exec. Plant</Data></Cell>
<Cell><Data ss:Type="String">Plant1</Data></Cell>
<Cell><Data ss:Type="String">Area</Data></Cell>
<Cell><Data ss:Type="String">Confirmed</Data></Cell>
<Cell><Data ss:Type="String">WO Status</Data></Cell>
<Cell><Data ss:Type="String">W/R Requester</Data></Cell>
</Row>
<Row>
</Row>
<Row>
<Cell><Data ss:Type="String">Count: 244</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>