0

I have to process user-provided markup for a specific kind of embed, which is typically in the form of a <script> tag, typically with a src attribute. There are a variety of different <script> components that can be used here, each one different. However, to avoid potential XSS attacks, we've deemed it necessary to strip out anything inside the tag.

<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">document.write("vinny say something funny"); //This should be sanitized out</script>

DOMDocument really doesn't give us an easy way to alter the innerhtml, and I have seen a few approaches but none seem to address keeping attribute intact if the tag is destroyed. Am I missing something in implementing a best approach, or is there an easier way to go about addressing this?

icy
  • 1,468
  • 3
  • 16
  • 36
Cameron Kilgore
  • 383
  • 7
  • 25

2 Answers2

1

This code removes child nodes from the <script> node. In this case it's the document element:

<?php
$xml = '<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">document.write("vinny say something funny");</script>';                               

$doc = new DOMDocument();
$doc->loadXml($xml);

$scriptNode = $doc->documentElement;

while ($scriptNode->hasChildNodes()) {
    $scriptNode->removeChild($scriptNode->lastChild);
}

echo $doc->saveXML();

Output is:

<?xml version="1.0"?>
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"/>
Sean Bright
  • 118,630
  • 17
  • 138
  • 146
0

As a simple method is to do a shallow clone of the node (using cloneNode()) without the optional parameter.

This will go through the loaded document and replace each script node with the new content...

$html = '<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">document.write("vinny say something funny");</script>';

$doc = new DOMDocument();
$doc->loadHTML($html);

foreach ( $doc->getElementsByTagName("script") as $script ){
    $script->parentNode->replaceChild($script->cloneNode(), $script);
}
echo $doc->saveHTML();

gives...

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script></head></html>
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55