Somebody send me xml 1.0 file. The file has illegal characters in it like , i cannot do anything about it, this initial condition is the statement of the problem.
Java parsers (dom4j-1.6.1.jar) of course faile. Tried substitute xml version to 1.1 in the header, it doesn't work. Or is parser version problem, I don't know.
I wonder about possible best solutions.
My workaround at the moment: - regexp the wrong characters before parsing
It's really the only solution? is there any schema or external entity (?) definition I could use? or another parser? The illegal characters are in the attributes. I think CDATA will not work
It's really a nasty problem.
The xml are generated by a Windows Web service framework, I don't know which one. I'm not aware whether there is some simple fix that could be done from the generation side. But it must really simple otherwise, the web service provider will not implement it.