0

I have basic editor which allow the user to enter notes. I am using the https://quilljs.com/ API for the editor. The content of the editor will be saved in a database, but before that persisting the data, I want to sanitize the HTML content, to remove all possible JavaScript events in Oracle PL/SQL. I am not able to get a regular expression to sanitize the HTML content before saving.

Example: <p>This is <a href="http://www.test.com" rel="noopener noreferrer" target="_blank">www.test.com</a></p><p>ffffff</p><p><br></p><p><br></p><p>Review at <a href="http://www.1159pm.com" rel="noopener noreferrer" target="_blank" **onclick="alert()" ondblclick="alert()" onmouseover="alert()" onkeypress="alert()"**>www.1159PM.com</a> </p><p>fffffff</p>'

Expected Result: <p>This is <a href="http://www.test.com" rel="noopener noreferrer" target="_blank">www.test.com</a></p><p>ffffff</p><p><br></p><p><br></p><p>Review at <a href="http://www.1159pm.com" rel="noopener noreferrer" target="_blank">www.1159PM.com</a> </p><p>fffffff</p>'

All Js events removed. Other scripts and styles should be removed as well. Please help me with the Oracle RegEx to solve this problem.

user3422517
  • 93
  • 2
  • 9

1 Answers1

0

If your HTML is restricted to valid XHTML (i.e. it has a single root element and each of the opened tags is closed) then you can use:

INSERT INTO table_name (value) VALUES (
  XMLQuery(
    'copy $i := $p1
    modify (delete nodes ($i//@onclick, $i//@ondblclick, $i//@onmouseover, $i//@onkeypress))
    return $i'
    PASSING XMLTYPE('<html><p>This is <a href="http://www.test.com" rel="noopener noreferrer" target="_blank">www.test.com</a></p><p>ffffff</p><p><br /></p><p><br /></p><p>Review at <a href="http://www.1159pm.com" rel="noopener noreferrer" target="_blank" onclick="alert()" ondblclick="alert()" onmouseover="alert()" onkeypress="alert()">www.1159PM.com</a> </p><p>fffffff</p></html>')
      AS "p1"
    RETURNING CONTENT
  ).getClobVal()
)

Which, for the table:

CREATE TABLE table_name (value CLOB);

Then the inserted value is:

VALUE
<html><p>This is <a href="http://www.test.com" rel="noopener noreferrer" target="_blank">www.test.com</a></p><p>ffffff</p><p><br/></p><p><br/></p><p>Review at <a href="http://www.1159pm.com" rel="noopener noreferrer" target="_blank">www.1159PM.com</a></p><p>fffffff</p></html>

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • Hi MT0, Is it possible to use regExp in place of ($i//@onclick, $i//@ondblclick, $i//@onmouseover, $i//@onkeypress)? What that means is that, since there are hundreds of js events it is not possible to list them all and I cannot predict which the bad guy will use. So, it will preferrable to use a RegExp to delete the js events. – user3422517 Mar 14 '22 at 21:05
  • @user3422517 It is impossible to reliably parse arbitrary HTML with a regular expression; you should [use a proper parser](https://stackoverflow.com/a/1732454/1509264). It is even more difficult to parse HTML with a regular expression in Oracle because Oracle does not support the full range of syntax that some other regular expression languages do (i.e. Oracle does not support look-ahead, look-behind, word boundaries or non-capturing groups). – MT0 Mar 14 '22 at 21:25
  • 1
    Additionally, if you are worried about hundreds of JS events and not being able to list them all then you are going to have exactly the same problem of having to list them all with regular expressions. Don't use regular expressions; use a proper parser (that could be an XML parser or it could be using Java embedded within the database to parse the HTML or something else). Someone else may give you a regular expression solution that may possibly work for a limited set of attributes but I think that would be a flawed way to approach the problem. – MT0 Mar 14 '22 at 21:25