I want to remove an image tag from a SVG file in case the image has a base64 encoded image.
String parser = XMLResourceDescriptor.getXMLParserClassName();
SAXSVGDocumentFactory fac = new SAXSVGDocumentFactory(parser);
Document svgDoc = fac.createDocument(inFile.toURI().toString());
To delete the image tag I do this:
public static final Pattern BASE64_PATTERN = Pattern.compile("^.*data:image/(?:png|jpg|jpeg);base64,.*");
private static void deleteBase64Images(Document doc) {
NodeList nodes = doc.getElementsByTagName("image");
int length = nodes.getLength();
for (int i = length - 1; i >= 0; i--) {
Element image = (Element)nodes.item(i);
String attribute = image.getAttribute("xlink:href");
if (BASE64_PATTERN.matcher(attribute).matches()) {
image.getParentNode().removeChild(image);
}
}
}
This is a minimal SVG that has an image with an attribute "xlink:href":
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/" width="51.1939in"
height="23.6013in"
viewBox="0 0 3685.96 1699.29" xml:space="preserve" color-interpolation-filters="sRGB"
class="st38">
<image x="0" y="50" width="64" height="64" image-rendering="optimizeSpeed"
xlink:href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAICAYAAADED76LAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAA9SURBVChTfY8xDgAgCMT4/3cZbzhzRI0o2IVAu2AAWKG7u9PmnlhS8wlOKVJwS6GDSooIOinii04K+0kAHIw2/SFJpU9IAAAAAElFTkSuQmCC">
</image>
</svg>
However, the line image.getAttribute("xlink:href");
returns an empty String. What is strange is that image.getAttributes().item(i)
returns an attribute with the name "xlink:href" (for some i).
Consider this code that uses javax.xml.parsers.DocumentBuilder
instead:
private static String preprocessSVG(File originalFile) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try {
builder = factory.newDocumentBuilder();
Document document = builder.parse(originalFile);
deleteBase64Images(document);
} catch (Exception e) {
}
return null;
}
In this case, the line image.getAttribute("xlink:href");
returns the correct attribute. Why is the Document parsed by org.apache.batik.dom.svg.SAXSVGDocumentFactory
behaving so strange?
What am I missing here?
The Document Implementation is SVGOMDocument
.