0

For my customer I have to marshal a XML file (received from an external service) to Java entities, and save it on database. For that I am using a simple Jaxb method that does the job.

I have an issue with the XML file. I received it and I don't understand why acute accent caracter doesn't show correctly in the file. It is encoded in UTF-8 in Unix (LF). Acute accent is display like that in the file :

enter image description here

When copy it and paste it on a new file it is correctly displayed.

The problem is that when Jaxb process the file, I get this error:

org.springframework.dao.DataAccessResourceFailureException: Error reading XML stream; nested exception is javax.xml.stream.XMLStreamException: ParseError at [row,col]:[14642,669]
Message: The element type "Nm" must be terminated by the matching end-tag "</Nm>".

It's not an end-tag issue, it is correctly closed. Wen I replace this "XB4" caracter by another one, it works properly.

Java file encoding format is UTF-8.

Does someone have an idea ?

Thanks a lot.

Dupuis David
  • 391
  • 2
  • 13
  • The file seems to be encoded in `ISO-8859-1`, what does the `` processing instruction say? – Piotr P. Karwasz Jan 13 '21 at 18:14
  • Does this answer your question? [Is there a standard naming convention for XML elements?](https://stackoverflow.com/questions/442529/is-there-a-standard-naming-convention-for-xml-elements) – JosefZ Jan 13 '21 at 18:24
  • Hello @PiotrP.Karwasz , unfortunately it is – Dupuis David Jan 13 '21 at 18:49
  • @JosefZ This not concerns my issue. Thanks – Dupuis David Jan 13 '21 at 18:58
  • Can you explain what path does the XML file take in your system? You receive it in text form and the error occurs while you transform it to a POJO? The image you published is the file _before_ it enters your system (in Notepad++ or similar)? That looks like a file partially encoded in UTF-8 and partially in Latin1. – Piotr P. Karwasz Jan 13 '21 at 19:24
  • [*Element names can contain letters, digits, hyphens, underscores, and periods*](https://stackoverflow.com/a/31130882/3439404). I think that an acute accent character does not match these criteria… – JosefZ Jan 13 '21 at 20:02
  • @PiotrP.Karwasz In reality I opened it on Notepad++ and seen what I posted on my initial post. The error occured without my intervention. The file is upload by someone on a secure server then my application take this file and try transform it into a POJO. – Dupuis David Jan 13 '21 at 20:05
  • @JosefZ indeed but it is not an element name. It's the content of the element . – Dupuis David Jan 13 '21 at 20:05
  • In any case, you should include a [mcve] (pictures are unwanted for obvious reason.) – JosefZ Jan 13 '21 at 20:08
  • 1
    It seems your XML file is just badly encoded (the acute accent is not encoded in UTF-8), however the parsing message is strange. What implementation of StAX are you using? – Piotr P. Karwasz Jan 13 '21 at 21:38
  • @PiotrP.Karwasz Yes that exactly what I said today to my boss ahah. I will see with the team that create this file. Thanks. – Dupuis David Jan 13 '21 at 22:24

0 Answers0