0

I would like to parse the following string:

this OR (this AND (this OR (that AND this))) OR this

Into this:

  [
    "this", 
    "OR", 
    [
      "this", 
      "AND", 
      [
        "this", 
        "OR", 
        [
          "that", 
          "AND",
          "this"
        ]
      ]
    ], 
    "OR", 
    "this"
  ]

However, I also need to support things like "this is a text" OR this which parses as ["this is a text", "OR", "this"] where the quotes represent the whole string which cancels the space-breaking between words.

But this is not an issue since it's not recursive like brackets, this can be done with regex.

After some research, this can not be done with ES regex, but can be done with PHP's regex. Is there a way to do it with javascript's regex? If not, what are the best alternatives to use, maybe a pre-built library for this?

Snirka
  • 602
  • 1
  • 5
  • 19
Ben Beri
  • 1,101
  • 4
  • 23
  • 64
  • How would you do that with PHP regex (I'm not a php guy)? Because theoretically, a regex cannot ensure that every opening bracket is matched with the correct closing bracket, if the depth of the expression tree is not limited. You would need a pushdown automata for that. And even if the depth is limited, the regex would be quite complicated. Of course regex in various languages do have some extensions to the pure regular expression concept how it is defined in Computer Science ... – derpirscher Jun 28 '21 at 10:34
  • @derpirscher Hello, sorry I wasn't really clear, I didn't mean recursively, because it doesn't do it. But it knows how to find the bracket that closes itself without taking any inner bracket: `(\(([^()]|(?R))*\))`. As you see it uses `Subroutine token`, it is not supported in other languages. But if you do know how to solve this like the PHP regex but in javascript, that'd solve all of my issues because I can recursively run the regex – Ben Beri Jun 28 '21 at 13:35
  • *tsk tsk* for sneaking in a library recommendation request at the end of your question. but here you go: See [Parsing in JavaScript: Tools and Libraries](https://tomassetti.me/parsing-in-javascript/) – Wyck Jun 28 '21 at 19:22
  • @Ben Beri, _"But this is not an issue since it's not recursive like brackets, this can be done with regex."_ - How It could be done with regex ? – Nur Jul 03 '21 at 17:00
  • @Nur Grouping all values by quotes is pretty simple task for regex, since it's not recursive, unlike brackets where you can have cases like ( ( ( ) ) ). And the regex for quotes I used was `(["'])(?:(?=(\\?))\2.)*?\1` – Ben Beri Jul 03 '21 at 17:24

1 Answers1

5

String is just a stream of character.

  • At first we need a lexical analyzer. It take stream of character and generate token. Example:
Input  = `"Hello world" (!)`
Output =  Lex(Input) -> ["Hello world", "(" , "!", ")"] 
  • We can use a recursive function. where we can create a stack on top and return new array.
Tokens = ["Hello world", "(" , "!", ")"]
Result = [ "Hello world", ["!"] ]

let input = `this OR (this AND (this OR (that AND this))) "this is a text" OR this`

function* Lex(characters) {
    let token = "", char;
    while (char = characters.shift()) switch (char) {
        case '(': case ')':
            if (token) { yield token; token = '' }
            yield char
            break
        case '"':
            let index = characters.indexOf(char);
            token = characters.splice(0, index).join('');
        case ' ':
            if (token) yield token
            token = ''
            break
        default: token += char
    }
}
function fnName(tokens) {
    let result = [], item;
    while (item = tokens.shift()) {
        if (item == ')') return result
        result.push(item == '(' ? fnName(tokens) : item)
    }
    return result
}
let tokens = [...Lex(input.split(''))];
console.log(fnName(tokens))
Nur
  • 2,361
  • 2
  • 16
  • 34