2

Main Question: Should escaped backslashes also be stored in the database for Javascript and how well that would play with PHP's regex engine?

Details

I have a number of regex patterns which can be used to classify strings into various categories. An example is as below:

(^A)|(\(A)

This can recognize for example an "A" in the start of the string or if it is immediately after an opening bracket ( but not if it is anywhere else in the string.

  • DBC(ABC)AA

  • ABC(DBC)AA

My project uses these regex patterns in two languages PHP and Javascript.

I want to store these patterns in a MySQL database and since there is no datatype for regex, I thought I could store it as VARCHAR or TEXT.

The issue arises if I directly use strings in Javascript, the \( is counted only as ( as the \ backslash is used as an escape character. if this is used to create new RegExp it gives an error:

Uncaught SyntaxError: unterminated parenthetical

For example:

let regexstring = "(^A)|(\(A)";

console.log(regexstring); // outputs => "(^A)|((A)"

let regex = new RegExp(regexstring); // Gives Uncaught SyntaxError: unterminated parenthetical

Based on this answer in StackOverflow, the solution is to escape the backslashes like:

let regexstring = "(^A)|(\\(A)";

console.log(regexstring); // Outputs => "(^A)|(\\(A)"

regex = new RegExp(regexstring);

The question is therefore, should escaped backslashes also be stored in the database and how well that would play with PHP's regex engine?

Coola
  • 2,934
  • 2
  • 19
  • 43

1 Answers1

2

I would store the raw regular expression.

The additional escape character is not actually part of the regex. It's there for JS to process the string correctly, because \ has a special meaning. You need to specify it when writing the string as "hardcoded" text. In fact, it would also be needed in the PHP side, if you were to use the same assignment technique in PHP, you would write it with the escape backslash:

$regexstring = "(^A)|(\\(A)";

You could also get rid of it if you changed the way you initialize regexstring in your JS:

<?
...
$regexstring = $results[0]["regexstring"];
?>

let regexstring = decodeURIComponent("<?=rawurlencode($regexstring);?>");
console.log(regexstring);

Another option is to just add the escaping backslashes in the PHP side:

<?
...
$regexstring = $results[0]["regexstring"];
$escapedRegexstring = str_replace('\', '\\', $regexstring);
?>

let regexstring = "<?=$escapedRegexstring;?>";

However, regardless of escaping, you should note that there are other differences in syntax between PHP's regex engine and the one used by JS, so you may end up having to maintain two copies anyway.

Lastly, if these regex expressions are meant to be provided by users, then keep in mind that outputting them as-is into JS code is very dangerous as it can easily cause an XSS vulnerability. The first method, of passing it through rawurlencode (in the PHP side) and decodeURIComponent (in the JS side) - should eliminate this risk.

obe
  • 7,378
  • 5
  • 31
  • 40
  • Thanks. So if I understand correctly, I should store the strings in the database as the first case i.e. `(^A)|(\(A)`. Also regarding your security issue, would the issue still exist if my PHP application JSON encodes the list of Regex and sends it to my JS as JSON? It's a laravel application by the way. – Coola Aug 01 '20 at 17:38
  • 1
    I would store without the extra escaping. I would also ask a developer to do the same if I code-reviewed their work, because I think it's a cleaner design. But technically speaking either way can work. Regarding security - if the JSON is received over AJAX and not actually rendered in the HTML itself then it should be fine. The act of "putting" a string in a JSON structure in itself is generally not enough to protected from XSS. – obe Aug 01 '20 at 18:46
  • 1
    It's a good idea to aspire to keep "raw" data in storage. If you keep the data with the added escaping, then you are carrying over a concern specific to a (specific) JS implementation into your data structure. Generally speaking, storing and processing data closer to its "essential" form is a good practice because it gives flexibility and makes the system more maintainable as it grows and the data needs to be accessed/manipulated/transported/used in different ways and in different languages. – obe Aug 01 '20 at 18:53
  • Thanks that was what I was worried about as well i.e. making it too specific for my implementation. It is JSON received over AJAX, and not rendered in the HTML. – Coola Aug 01 '20 at 19:29