1

I am trying to parse a xml using stax but the error I get is:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[414,47]
Message: The reference to entity "R" must end with the ';' delimiter.

Which get stuck on the line 414 which has P&Rinside the xml file. The code I have to parse it is:

public List<Vild> getVildData(File file){
    XMLInputFactory factory = XMLInputFactory.newFactory();
    try {
        ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(Files.readAllBytes(file.toPath()));
        XMLStreamReader reader = factory.createXMLStreamReader(byteArrayInputStream, "iso8859-1");
        List<Vild> vild = saveVild(reader);
        reader.close();
        return vild;
    } catch (IOException e) {
        e.printStackTrace();
    } catch (XMLStreamException e) {
        e.printStackTrace();
    }
    return Collections.emptyList();
}
private List<Vild> saveVild(XMLStreamReader streamReader) {
    List<Vild> vildList = new ArrayList<>();
    try{
        Vild vild = new Vild();
        while (streamReader.hasNext()) {
            streamReader.next();
            //Creating list with data
        }
    }catch(XMLStreamException | IllegalStateException ex) {
        ex.printStackTrace();
    }
    return Collections.emptyList();
}

I read online that the & is invalid xml code but I don't know how to change it before it throws this error inside the saveVild method. Does someone know how to do this efficiently?

MrAndre
  • 811
  • 1
  • 10
  • 26

2 Answers2

0

Change the question: you're not trying to parse an XML file, you're trying to parse a non-XML file. For that, you need a non-XML parser, and to write such a parser you need to start with a specification of the language you are trying to parse, and you'll need to agree the specification of this language with the other partners to the data interchange.

How much work you could all save by conforming to standards!

Treat broken XML arriving in your shop the way you would treat any other broken goods coming from a supplier: return it to sender marked "unfit for purpose".

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
0

The problem here, as you mention is that the parser finds the & and it expects also the ;

This gets fixed escaping the character, so that the parser finds &amp; instead.

Take a look here for further reference

doper
  • 33
  • 7