I am creating charts on the fly as SVGs using d3.js. These charts are dynamically generated based on the selections of authenticated users. Once these charts are generated, the user has the option to download the generated SVG as a PNG or PDF.
The current workflow is the following:
// JAVASC
// get the element containing generated SVG
var svg = document.getElementById("chart-container");
// Extract the data as SVG text string
var svg_xml = (new XMLSerializer).serializeToString(svg);
// Submit the <FORM> to the server.
var form = document.getElementById("svgform");
form['output_format'].value = output_format; // can be either "pdf" or "png"
form['data'].value = svg_xml ;
form.submit();
The FORM element is a hidden form, used to POST the data:
<form id="svgform" method="post" action="conversion.php">
<input type="hidden" id="output_format" name="output_format" value="">
<input type="hidden" id="data" name="data" value="">
</form>
The PHP file saves the provided SVG data as a temporary file:
// check for valid session, etc - omitted for brevity
$xmldat = $_POST['data']; // serialized XML representing the SVG element
if(simplexml_load_string($xmldat)===FALSE) { die; } // reject invalid XML
$fileformat = $_POST['output_format']; // chosen format for output; PNG or PDF
if ($fileformat != "pdf" && $fileformat != "png" ){ die; } // limited options for format
$fileformat = escapeshellarg($fileformat); // escape shell arguments that might have snuck in
// generate temporary file names with tempnam() - omitted for brevity
$handle = fopen($infile, "w");
fwrite($handle, $xmldat);
fclose($handle);
A conversion utility is run which reads the temporary file ($infile) and creates a new file ($outfile) in the specified $fileformat (PDF or PNG). The resulting new file is then returned to the browser, and the temporary files are deleted:
// headers etc generated - omitted for brevity
readfile($outfile);
unlink($infile); // delete temporary infile
unlink($outfile); // delete temporary outfile
I have investigated converting the SVG to a PNG using JavaScript (canvg(), then toDataURL, then document.write), and may use this for generating the PNGs, but it doesn't allow for conversion to PDF.
So: How can I best sanitize or filter the SVG data which is provided to conversion.php, before it's written to a file? What's the current state of SVG sanitization? What's available within PHP? Should I go with a whitelist-based approach to sanitizing the SVG data provided to conversion.php, or is there a better way?
(I do not know XSLT, though I could try to learn it; I hope to keep the sanitization within PHP as much as possible. Using Windows Server 2008, so any solutions that use external tools would need to be available within that ecosystem.)