1

I'm using TDD and I have to pass a set of tests to implement a new library:

public function providerEdgesParser()
{
    return array(
        array('.edges=(user)', false), // 0
        array('edges=test', false),
        array('another:chars', false),
        array('pl-ouf', false),
        array('test', array('test')),
        array('lang,lang', array('lang', 'lang')), // 5
        array('quest,ans', array('quest', 'ans')),
        array('q.edges=(a)', array('q' => array('a'))),
        array('e.edges=(lang,et.edges=(lang)),ans', array('e' => array('lang', 'et' => array('lang')), 'ans')),
    );
}

This is a PHPUnit provider. In each array, first element is the parameter of my function, second element is what my function must return. Here is this function I've come up with:

public function edgesParser($urlEdges)
{
      // Check if edges syntax is valid
      if (!preg_match('#^((?:(?:[a-z]+(?:\.edges\=\(\1\))?)\,?)+)$#ui', $urlEdges)) {
            throw new \Exception('Edges syntax is wrong');
      }

      // Then, use a recursive function to build the array
      // ...
      // ...
}

The only purpose of that regular expression is to detect bad syntax in the $urlEdges string, as it is an end user input. Only after, I will build the right array to return.

However, this regex doesn't seems to work the way I want: the two lastest tests throw an Exception. They should not.

I have been searching for a solution for a long time, but I just can't see where the regular expression is wrong. Here is a graphical representation of the regex. Could it back reference don't work when it's inside the referred group? Or did I make a trivial error that my tired eyes can't see?

Gui-Don
  • 1,366
  • 11
  • 25
  • 1
    I think you'd be better helped by a parser or lexer instead of regex. – hjpotter92 Aug 14 '14 at 08:53
  • 1
    Syntax checking requires more than regex can handle: [cf the Chomsky Hierarchy](http://en.wikipedia.org/wiki/Chomsky_hierarchy). Regex's deal with type-3 grammars, programming languages are at least type-2 grammars, regex simply does not suffice [as you can see here](http://en.wikipedia.org/wiki/Chomsky_hierarchy#mediaviewer/File:Chomsky-hierarchy.svg) – Elias Van Ootegem Aug 14 '14 at 09:03
  • 1
    You will need a recursive regex. `\1` will match exactly what was matched in group 1, while `(?1)` will execute the pattern from group 1. There's a lot to fix in your regex, also you forgot `=` in the last test in `et.edges(lang)`. I cleaned this up in [this regex](http://regex101.com/r/kN8vL9/1) and made another [modular regex here](http://regex101.com/r/eY6zO9/1). I'm not sure how you will build the array. Do you already have a function for that? Or are you intending to use regex for this job? – HamZa Aug 14 '14 at 09:55
  • @HamZa Ah yes, I see the problem now. Thanks, you resolve it. As for the spaces, I ommited a previous function, removing all whitspaces from the string, that's why it isn't in the regex. To build the array, I had the idea making a recursive function that build the array step by step. Of course, it will be better to do everything with a single regex, but I've already tried and can't spend several days working on this very problem. – Gui-Don Aug 14 '14 at 10:20
  • @Elias Van Ootegem Interesting. According to [this answer](http://stackoverflow.com/a/3614928/2746634), I understand that I deal with Level 2: Context-free grammars of the Chomsky Hierarchy, because my input grammar can be nested. So I can't use regex alone to do the job. I've got to use a lexer & parser (@hjpotter92). Am I right? – Gui-Don Aug 14 '14 at 12:35
  • @Archaygo: Pretty much: type-2 is beyond the reach of regex. End of. – Elias Van Ootegem Aug 14 '14 at 12:48

1 Answers1

0

@HamZa brought the answer.

\1 back reference matchs what was matched in group 1.

(?1) recursive mask executes the pattern from group 1.

The second option is what I needed. So, a suitable regex could be: #^((?:(?:[a-z]+(?:\.edges\=\((?1)\))?)\,?)+)$#ui (split up here).

Gui-Don
  • 1,366
  • 11
  • 25