I'm trying to parse an XML file with Java.
Before I start parsing, I need to replace (encode) some text between the <code>
and </code>
tags.
Therefore I read the contents of the file into a String:
File xml = new File(this.xmlFileName);
final BufferedReader reader = new BufferedReader(new FileReader(xml));
final StringBuilder contents = new StringBuilder();
while (reader.ready()) {
contents.append(reader.readLine());
}
reader.close();
final String stringContents = contents.toString();
After I readed the XML into the string, I encode the values using Pattern
and Matcher
:
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("<code>(.*?)</code>", Pattern.DOTALL);
Matcher m = p.matcher(stringContents);
while (m.find()) {
//Encode text between <code> and </code> tags
String valueFromTags = m.group(1);
byte[] decodedBytes = valueFromTags.getBytes();
new Base64();
String encodedBytes = Base64.encodeBase64String(decodedBytes);
m.appendReplacement(sb, "<code>" + encodedBytes + "</code>");
}
m.appendTail(sb);
String result = sb.toString();
After the replacements are done, I try to read this String
into the XML parser:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(result);
doc.getDocumentElement().normalize();
But then I get this error: java.net.MalformedURLException: no protocol: <root> <application> <interface>...
As you can see, after I read the File
into a String
, for some reasons there are a lot of spaces added, where there were newlines or tabs in the original file. So I think that's the reason why I get this error. Is there any way I can solve this?
and
tags then? Because I can't parse it before encoding it, it contains special characters like < and > and the parser will give errors because of that. But note that the problem that the parser can't parse the xml in my example has something to do with the way how I read it into a String using BufferedReader. The spaces are already there before the regex changement. – Kaj May 15 '14 at 23:25