3

I am currently developing an application in Javascript in which I will allow the users to add "rules" to the application by the use of a small self-declared programming language. In order to achieve this functionality, I need to be able to parse strings and extract the necessary information. Here are some examples of my language:

Example 1:

SET backgroundColor
MODULE someModule
TO red
WHEN someVariable == 1

Example 2:

SET textSize
TO someObject.size
WHEN 5 + 5 == 10

Example 3:

REMOVE backgroundColor
MODULE someModule
ID 5

Note that although I am making use of newlines in my examples, these rules can also just be formatted to one long string without any newlines.

As you can observe, it is an SQL-like language in which I make use of capitalized keywords. There are several combinations of keywords possible, just like SQL, but it is definetly not a huge language. After each keyword, the user can just write any simple Javascript expressions. This is important. I know one should usually write a parser, but in this case I do not think it is appropriate to reinvent the wheel and write a parser that can parse Javascript. Especially because, apart from these Javascript expressions, the language is rather simple/limited, it would be ideal if there is a more simple approach to tackling this problem.

I have already implemented functions that will take the necessary information as their parameters and add the rule to my system. What is left is to fill the gap. How do I efficiently verify a valid syntax and extract all the required information from the string I receive such that I can fill in a function like this:

addRuleToTheSystem('backgroundColor', 'someModule', 'red', 'someVariable == 1')
Astarno
  • 405
  • 3
  • 10
  • 1
    a) define a grammar b) write or generate a parser – Bergi Apr 23 '23 at 22:22
  • I edited the question to clarify why writing a parser might not be the answer I am looking for, except if there are no alternatives. In short, my language is very simple/short in essence and accepts javascript between the keywords. As such, writing a parser from scratch that understands javascript might not be worth it if there are simpler alternatives for my use-case. – Astarno Apr 24 '23 at 00:26
  • You could integrate an existing parser for javascript. However it sounds like you want to define "*simple Javascript expressions*" as some arbitrary JavaScript subset, or sloppy superset where you don't actually validate the JavaScript but accept anything close enough. Either way, you need to come up with the grammar first. – Bergi Apr 24 '23 at 00:40
  • It is indeed a superset. I have updated the question with a formalized grammer using Jison (saw that was popular). – Astarno Apr 24 '23 at 13:18
  • 1
    This is going to be tricky when the JavaScript expression assigns to a variable named `WHEN`. – trincot Apr 24 '23 at 13:54
  • 1
    Are multiple statements allowed to be on the same line? – trincot Apr 24 '23 at 14:05
  • Every example I provided counts as one "program", there will never be multiple "programs" glued together. Within one program, everything can be on the same line (e.g. `SET textSize TO someObject.size WHEN 5 + 5 == 10`. In fact, this will be the most common format it'll arrive at my parser. As for your other comment, the syntax of my language can still be changed if that makes things significantly easier. – Astarno Apr 24 '23 at 14:14

1 Answers1

1

First, I'm legally required to warn you about the security dangers of using eval. With that out of the way:

Option 1: forbid JS expressions from containing your keyword

If you can force users to never use your language keywords in their expressions, your parsing becomes very straightforward. If I read the grammar correctly, it can even be parsed by a regular expression. Here's an example of parsing a SET MODULE TO WHEN rule:

const regexp = /^SET (?<set>.+?) MODULE (?<module>.+?) TO (?<to>.+?) WHEN (?<when>.+?)$/gs;
const userRule = "SET backgroundColor MODULE someModule TO red WHEN someVariable == 1";
regexp.exec(userRule).groups
// Object { set: "backgroundColor", module: "someModule", to: "red", when: "someVariable == 1" }

This is more restrictive than your intended language (e.g. SET a MODULE MODULE.name TO "WHEN" will parse incorrectly), but your uppercase keywords help against simple mistakes, and you can catch more mistakes by ensuring at most one of each keyword exists.

Option 2: limited Javascript parsing

What syntax can you ignore? Operators, declarations, control flow, etc. What syntax can't you ignore? Anything that might accidentally contain your keyword (names, literal strings, comments, regexps).

This is harder than it sounds once you take into account all the different ways to declare strings, escape sequences, optional whitespace, etc. But it's doable, and doesn't require parsing the full JS syntax.

To reduce false positives, and to give users an escape hatch, consider ignoring keywords that appear inside balanced parenthesis/braces/brackets, as in:

SET backgroundColor MODULE (MODULE.name) TO (MODULE.color)

This allows the user to have a JS variable "MODULE" without conflicting with your keyword.

Option 3: use an existing Javascript parser

Acorn looks like a good option, and has a handy function:

parseExpressionAt(input, offset, options) will parse a single expression in a string, and return its AST. It will not complain if there is more of the string left after the expression.

By enabling the locations option, you can then alternate between looking for your keywords and handing over the rest of the string to Acorn to find an expression. Then look for a keyword after the last location, and repeat.

BoppreH
  • 8,014
  • 4
  • 34
  • 71
  • Thank you! Since this is a small-scale research-project I am working on and I can easily forbid the usage of certain words in my expressions, I went for the first option for now. I can always expand later. – Astarno Apr 24 '23 at 23:14