0

I want to remove an image tag from a SVG file in case the image has a base64 encoded image.

String parser = XMLResourceDescriptor.getXMLParserClassName();
SAXSVGDocumentFactory fac = new SAXSVGDocumentFactory(parser);

Document svgDoc = fac.createDocument(inFile.toURI().toString());

To delete the image tag I do this:

public static final Pattern BASE64_PATTERN = Pattern.compile("^.*data:image/(?:png|jpg|jpeg);base64,.*");

private static void deleteBase64Images(Document doc) {
    NodeList nodes = doc.getElementsByTagName("image");

    int length = nodes.getLength();
    for (int i = length - 1; i >= 0; i--) {
        Element image = (Element)nodes.item(i);
        String attribute = image.getAttribute("xlink:href");
        if (BASE64_PATTERN.matcher(attribute).matches()) {
            image.getParentNode().removeChild(image);
        }
    }
}

This is a minimal SVG that has an image with an attribute "xlink:href":

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/" width="51.1939in"
height="23.6013in"
viewBox="0 0 3685.96 1699.29" xml:space="preserve" color-interpolation-filters="sRGB"
class="st38">   

<image x="0" y="50" width="64" height="64" image-rendering="optimizeSpeed"
xlink:href="">        
</image>
</svg>

However, the line image.getAttribute("xlink:href"); returns an empty String. What is strange is that image.getAttributes().item(i) returns an attribute with the name "xlink:href" (for some i).

Consider this code that uses javax.xml.parsers.DocumentBuilder instead:

private static String preprocessSVG(File originalFile) {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder;
    try {
        builder = factory.newDocumentBuilder();
        Document document = builder.parse(originalFile);
        deleteBase64Images(document);
    } catch (Exception e) {
    }
    return null;
}

In this case, the line image.getAttribute("xlink:href"); returns the correct attribute. Why is the Document parsed by org.apache.batik.dom.svg.SAXSVGDocumentFactory behaving so strange? What am I missing here?

The Document Implementation is SVGOMDocument.

Sadık
  • 4,249
  • 7
  • 53
  • 89
  • The regEx doesn't match the code for the image. The dot matches any single character, except a line break. In your code you have 2 line breaks – enxaneta Mar 03 '23 at 14:07
  • @enxaneta The value of "xlink:href" doesn't have a line breaks. "attribute" should only hold the string "data:image/...QmCC". Even if there were line breaks in it, attribute should not be empty. – Sadık Mar 03 '23 at 14:38
  • you'd need to call getAttributeNS to get an attribute from the xlink namespace. – Robert Longson Mar 03 '23 at 14:58
  • @RobertLongson you mean `image.getAttributeNS("xlink", "href");`? Returns an empty string, too. – Sadık Mar 03 '23 at 15:19
  • @RobertLongson Thanks for the hint. This means that this question is a duplicate of another question. The provided answer works for me. – Sadık Mar 03 '23 at 15:21

0 Answers0