10

I am creating charts on the fly as SVGs using d3.js. These charts are dynamically generated based on the selections of authenticated users. Once these charts are generated, the user has the option to download the generated SVG as a PNG or PDF.

The current workflow is the following:

// JAVASC
// get the element containing generated SVG
var svg = document.getElementById("chart-container");

// Extract the data as SVG text string
var svg_xml = (new XMLSerializer).serializeToString(svg);

// Submit the <FORM> to the server.
var form = document.getElementById("svgform");
form['output_format'].value = output_format;  // can be either "pdf" or "png"
form['data'].value = svg_xml ;
form.submit();

The FORM element is a hidden form, used to POST the data:

<form id="svgform" method="post" action="conversion.php">
  <input type="hidden" id="output_format" name="output_format" value="">
  <input type="hidden" id="data" name="data" value="">
</form>

The PHP file saves the provided SVG data as a temporary file:

// check for valid session, etc - omitted for brevity 

$xmldat = $_POST['data'];  // serialized XML representing the SVG element
if(simplexml_load_string($xmldat)===FALSE) { die; } // reject invalid XML  

$fileformat = $_POST['output_format'];  // chosen format for output;  PNG or PDF
if ($fileformat != "pdf" && $fileformat != "png" ){ die; } // limited options for format
$fileformat = escapeshellarg($fileformat); // escape shell arguments that might have snuck in

// generate temporary file names with tempnam() - omitted for brevity

$handle = fopen($infile, "w");
fwrite($handle, $xmldat);
fclose($handle);

A conversion utility is run which reads the temporary file ($infile) and creates a new file ($outfile) in the specified $fileformat (PDF or PNG). The resulting new file is then returned to the browser, and the temporary files are deleted:

// headers etc generated - omitted for brevity
readfile($outfile);

unlink($infile);  // delete temporary infile  
unlink($outfile);  // delete temporary outfile  

I have investigated converting the SVG to a PNG using JavaScript (canvg(), then toDataURL, then document.write), and may use this for generating the PNGs, but it doesn't allow for conversion to PDF.

So: How can I best sanitize or filter the SVG data which is provided to conversion.php, before it's written to a file? What's the current state of SVG sanitization? What's available within PHP? Should I go with a whitelist-based approach to sanitizing the SVG data provided to conversion.php, or is there a better way?

(I do not know XSLT, though I could try to learn it; I hope to keep the sanitization within PHP as much as possible. Using Windows Server 2008, so any solutions that use external tools would need to be available within that ecosystem.)

Community
  • 1
  • 1
Ale Exc
  • 135
  • 1
  • 8
  • I asked a [similar question](http://stackoverflow.com/questions/9654664/security-implications-of-letting-users-render-own-svg-files) earlier this year, but didn't get many bites. You could validate against the 1.1 spec if you're not using 1.2 (or any extensions as per an Inkscape document), see my other [question here](http://stackoverflow.com/questions/9651493/validating-svg-file-in-php-with-xmlreader). – halfer Dec 20 '12 at 17:56
  • From a security perspective, if you are handling potentially tainted SVG files, the main thing is to strip XML entities. I don't think they serve any useful purpose, but [can be used maliciously](http://blog.jondh.me.uk/2012/09/inkscape-xml-entity-vulnerabilities/). – halfer Dec 20 '12 at 17:58
  • @halfer - Thanks, but darn! I had hoped someone would pull back a curtain to reveal `SVGpurifier` or a comparable Christmas miracle. – Ale Exc Dec 21 '12 at 19:01
  • Heh, that would be good! I keep meaning to go back to that project, but since it is spare-time F/OSS, it's very much on the back burner. Ping me here if you get any luck on it, I should be interested to see what you come up with. – halfer Dec 22 '12 at 18:39
  • I'd like to see a solution as well, but currently have no real-world use for it though. If no one answers with a solution in a month I'll open a bounty on it. I've starred the question so will keep an eye on it. – kittycat Jan 14 '13 at 01:44
  • Yes, I'm sure it's a bug. You'd better put all defs elements before all use elements. – cuixiping Jan 26 '13 at 07:03
  • @AleExc From what I know of being vulnerable to attack when converting SVG to a drawing, it would have to do with displaying files on the server, or getting from another server. If I am not missing something, you can replace all `file://` with '' and it would prevent the attack. If you want to prevent external site pulls, you can also replace `http://` with '' as well. It a quick and dirty way to prevent attacks. – Jon Feb 24 '13 at 11:10

3 Answers3

4

You need to sanitize SVG using XML parser + whitelist.

Because SVG already has multiple ways to execute code and future extensions may add additional methods, you simply cannot blacklist "known dangerous" constructs. Whitelisting safe elements and attributes does work as long as you correctly handle all the XML corner cases (e.g. XSLT stylesheets, entity expansions, external entity references).

Example implementations: https://github.com/alnorris/SVG-Sanitizer/blob/master/SvgSanitizer.php (MIT license) or https://github.com/darylldoyle/svg-sanitizer (GPL v2 license)

More information about attack vectors that you have to consider while selecting which features you want to support:

Mikko Rantalainen
  • 14,132
  • 10
  • 74
  • 112
  • Remember to test entity handling, too. For example: ` ]> &test;` – Mikko Rantalainen Aug 28 '18 at 07:56
  • 3
    Please don't just post some tool or library as an answer. At least demonstrate [how it solves the problem](http://meta.stackoverflow.com/a/251605) in the answer itself. – Zoe Aug 28 '18 at 14:32
  • Please update svg-sanitizer's license, as per https://github.com/darylldoyle/svg-sanitizer/blob/master/LICENSE – Shmack Oct 08 '20 at 19:27
3

I am working with xml and PHP but I am not sure at all for your question. Please take it as an idea/suggestion, not more.

SimpleXML use libxml to load the xml content. http://www.php.net/manual/en/simplexml.requirements.php

You can disable the external entities using:

libxml_disable_entity_loader (TRUE)

http://www.php.net/manual/en/function.libxml-disable-entity-loader.php

before loading your file with simpleXML.

Then you could validate against SVG schema

http://us3.php.net/manual/en/domdocument.schemavalidate.php or http://us3.php.net/manual/en/domdocument.validate.php

The only concern I would see is that svg could contain script element. http://www.w3.org/TR/SVG/script.html#ScriptElement

There information on 1.1 DTD here: http://www.w3.org/Graphics/SVG/1.1/DTD/svg-framework.mod http://www.w3.org/TR/2003/REC-SVG11-20030114/REC-SVG11-20030114.pdf

You might provide a SVG DTD with a modified version of the script element or loop through elements to prevent the script element to be present.

It won't be perfect, but at least better than nothing.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Bertrand
  • 388
  • 4
  • 13
2

You can use SVG Sanitize package: https://packagist.org/packages/enshrined/svg-sanitize

Has 500k installs on the date this answer is written.

use enshrined\svgSanitize\Sanitizer;

// Create a new sanitizer instance
$sanitizer = new Sanitizer();

// Load the dirty svg
$dirtySVG = file_get_contents('filthy.svg');

// Pass it to the sanitizer and get it back clean
$cleanSVG = $sanitizer->sanitize($dirtySVG);

// Now do what you want with your clean SVG/XML data
Lucas Bustamante
  • 15,821
  • 7
  • 92
  • 86