-1

I have an SVG file, I want to remove all tags and the image tag that is inside except one, the one tag whose href link contains "artworks". I use php, and for convenience here is the content of SVG file. Note that the SVG file will have the newline characters removed in order to make the regex more simple.

My regex so far is :

(<g transform="(?:.*)?"><\/image><\/g>)  

which matches all tags and the image tag that is inside

<?xml version="1.0" encoding="UTF-8" standalone="no" ?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 1879 2053" xml:space="preserve"><desc>Created with Fabric.js 1.6.2</desc>
<defs></defs>
<g transform="translate(939.5 1026.5)">
<image xlink:href="/home/printplusprod/public_html/media/pdp/images/filename1462961406.jpg" x="-939.5" y="-1026.5" style="stroke: none; stroke-width: 0; stroke-dasharray: none; stroke-linecap: butt; stroke-linejoin: miter; stroke-miterlimit: 10; fill: rgb(0,0,0); fill-rule: nonzero; opacity: 1;" width="1879" height="2053" preserveAspectRatio="none"></image>
</g>
<g transform="translate(939.51 1026.5) scale(2.59 2.59)">
<image xlink:href="/home/printplusprod/public_html/media/pdp/images/overlay1462961406.png" x="-362.22" y="-395.76" style="stroke: none; stroke-width: 0; stroke-dasharray: none; stroke-linecap: butt; stroke-linejoin: miter; stroke-miterlimit: 10; fill: rgb(0,0,0); fill-rule: nonzero; opacity: 1;" width="724.44" height="791.52" preserveAspectRatio="none"></image>
</g>
<rect x="-362" y="-395.5" rx="0" ry="0" width="724" height="791" style="stroke: none; stroke-width: 1; stroke-dasharray: none; stroke-linecap: butt; stroke-linejoin: miter; stroke-miterlimit: 10; fill: rgb(0,0,0); fill-opacity: 0; fill-rule: nonzero; opacity: 1;" transform="translate(940.23 1027.12) scale(2.59 2.59)"/>
<g transform="translate(938.93 1025.83) scale(2.59 2.59)">
<image xlink:href="/home/printplusprod/public_html/media/pdp/images/filename1462961406.jpg" x="-362" y="-395.5" style="stroke: none; stroke-width: 0; stroke-dasharray: none; stroke-linecap: butt; stroke-linejoin: miter; stroke-miterlimit: 10; fill: rgb(0,0,0); fill-rule: nonzero; opacity: 1;" width="724" height="791" preserveAspectRatio="xMinYMin slice"></image>
</g>
<g transform="translate(784.5 1177.09) scale(1.5 1.5)">
<image xlink:href="/home/printplusprod/public_html/media/pdp/images/artworks/filename1453713655.jpg" x="-240" y="-179.875" style="stroke: none; stroke-width: 0; stroke-dasharray: none; stroke-linecap: butt; stroke-linejoin: miter; stroke-miterlimit: 10; fill: rgb(0,0,0); fill-rule: nonzero; opacity: 1;" width="480" height="359.75" preserveAspectRatio="none"></image>
</g>
<g transform="translate(938.93 1025.83)">
<image xlink:href="/home/printplusprod/public_html/media/pdp/images/overlay1462961406.png" x="-938.9347705562002" y="-1025.8251429695501" style="stroke: none; stroke-width: 0; stroke-dasharray: none; stroke-linecap: butt; stroke-linejoin: miter; stroke-miterlimit: 10; fill: rgb(0,0,0); fill-rule: nonzero; opacity: 1;" width="1877.8695411124004" height="2051.6502859391003" preserveAspectRatio="none"></image>
</g>
</svg>
Yvon Huynh
  • 453
  • 3
  • 16

1 Answers1

0

Although XPath allows you to find elements better than regex in XML documents, this problem could be easily solved using DOMDocument and just iterating through all of the <image> nodes.

The code below uses getElementsByTagName() to find all of the <image> nodes and the inspects the href attribute and checks if it contains "artworks". If it doesn't, then the node is removed (using parentNode to track back up from the image tag to the <g> node).

$xml = new DOMDocument();
$xml->loadXML($data);
$images = $xml->getElementsByTagName("image");
for ( $i = $images->length-1; $i>= 0; $i-- )   {
    $image = $images->item($i);
    if ( strpos($image->attributes->getNamedItem("href")->nodeValue, "artworks") === false )   {
        $g = $image->parentNode;
        $g->parentNode->removeChild($g);
    }
}

This assumes $data is the actual contents of the SVG, you would need to change this if it needed to load this directly from a file.

One thing which looks odd is that it actually goes through the <image> nodes backwards, the reason is that as you remove nodes, the order is changed, so removing earlier nodes moves subsequent nodes. Doing this in reverse stops this problem.

With the above example,the result is...

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 1879 2053" xml:space="preserve"><desc>Created with Fabric.js 1.6.2</desc>
<defs/>


<rect x="-362" y="-395.5" rx="0" ry="0" width="724" height="791" style="stroke: none; stroke-width: 1; stroke-dasharray: none; stroke-linecap: butt; stroke-linejoin: miter; stroke-miterlimit: 10; fill: rgb(0,0,0); fill-opacity: 0; fill-rule: nonzero; opacity: 1;" transform="translate(940.23 1027.12) scale(2.59 2.59)"/>

<g transform="translate(784.5 1177.09) scale(1.5 1.5)">
<image xlink:href="/home/printplusprod/public_html/media/pdp/images/artworks/filename1453713655.jpg" x="-240" y="-179.875" style="stroke: none; stroke-width: 0; stroke-dasharray: none; stroke-linecap: butt; stroke-linejoin: miter; stroke-miterlimit: 10; fill: rgb(0,0,0); fill-rule: nonzero; opacity: 1;" width="480" height="359.75" preserveAspectRatio="none"/>
</g>

</svg>
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55