4

Becaue I have a really redundant config-file format.. I invented a custom script-format for writing loops, for example:

[Config Object]
{Loop 3
    Setting[i]  = Value[i]
}
OtherSetting=X

Which will result in:

[Config Object]
Setting1     = Value1
Setting2     = Value2
Setting3     = Value3
OtherSetting = X

My first idea was to use regular expressions, like this one:

!{(.*?)}!is

That worked really well until i tried to use it with nested loops - you surely know this "oh cr... moments"

Because the following:

1: [Config Object]
2: *{*Loop 3
3:    Section[i]
4:    {Loop 3
5:        Setting[i]    = Value[i]
6:     *}*
7: }
8: OtherSetting=X

Will lead the regex to cover the range between line 2 and line 6 (market them with *s)

And actually I really have no Idea how to solve this because the regex is logically doing right.

The ? Lazy-Operator is needed because without it I would have the same problem in the another direction and would not be able to write two following loops.

Little bit research made me clear that regex is not the right direction here, but I couldn't find any PHP-Solutions. So how may I performantly parse my "loop"-script in PHP getting for example an array with the loops and replacing the commands within the braces with the calculated results?

iceteea
  • 1,214
  • 3
  • 20
  • 35
  • 2
    You will need a recursive pattern, take a look at [this answer](http://stackoverflow.com/questions/14952113/how-can-i-matches-the-nested-brackets-by-regex/14952740#14952740). – HamZa Oct 30 '13 at 19:00
  • 1
    I now remember that I wrote a [small parser](http://stackoverflow.com/a/16420497), it may be interesting ... – HamZa Oct 30 '13 at 19:02
  • 1
    Yes, "how to create parsers" is quite a broad topic, hardly answerable in a single answer. A parser is typically a *state machine*, start there. Look at some simple parsers, like for JSON. See my profile for a simple Rison parser. For creating a simple language, you then want an abstract syntax tree. Check out something like Twig for a decent parser with AST. – deceze Oct 30 '13 at 19:08
  • Result should be 3 Sections each containing 3 Settings. EDIT: why have you removed your comment? – iceteea Oct 30 '13 at 19:30
  • With your expression `{(.*?)}` I suppose your matching everything between `{ }` and then parsing out the data? – hwnd Oct 30 '13 at 19:33
  • @hwnd yeah, well the problem is, if he has nested braces like `{foo {bar}}`, that regex will match `{foo {bar}` while he wants to match `{foo {bar}}`. I've provided him a link to an answer but he doesn't seem to respond ... Maybe he didn't know how to edit that regex to something like this [`\{(?:[^{}]|(?R))*\}`](http://regex101.com/r/nD5nK0) ? – HamZa Oct 30 '13 at 20:01
  • 2
    Yea, he could implement that too work for him. What about same thing, different concept allowing comments and stuff? http://regex101.com/r/mP3xP0 – hwnd Oct 30 '13 at 20:07
  • You should take a look at some lexer/parser implementations, e.g. Doctrine or Syfmony2 Expressionlanguage. – nietonfir Oct 30 '13 at 22:16
  • @iceteea Will the link I posted above showing a demo of your matches not work for your case? – hwnd Oct 30 '13 at 22:33
  • @hwnd sorry I can't try it out atm. But I've got the impression this regex does only respect the first level and ignores the nested ones. This is okay, I could check after loop-childs and run the regex on them again but than I would need a logic for looping trough the text finding x-deep nested loops which seems like another not easy task to me. – iceteea Oct 31 '13 at 08:38

2 Answers2

3

The proper solution is mentioned in the comments. You need to actually write a compiler/parser. My memory is a little fuzzy from my compilers course, but here is how you would approach it.

The basic concept is to convert the input to tokens (this is where regular expressions are okay). This is called lexical analysis

So:

[Config Object]
{Loop 3
   Section[i]
   {Loop 3
       Setting[i]    = Value[i]
   }
}
OtherSetting=X

becomes (pseudo code tokens, and maybe not exactly what you need)

OPEN_BRACKET STRING(=Config Object) CLOSE_BRACKET
START_LOOP NUMBER(=3)
   STRING(=Section) OPEN_BRACKET STRING(=i) CLOSE_BRACKET
   START_LOOP NUMBER(=3)
       STRING(=Setting) OPEN_BRACKET STRING(=i) CLOSE_BRACKET EQUAL STRING(=Value) OPEN_BRACKET STRING(=i) CLOSE_BRACKET
   END_LOOP
END_LOOP
STRING(=OtherSetting) EQUAL STRING(=X)

So if your lexer gets you an array of tokens like the above, you just need to parse it to an actual grammar (so this is where you don't want to use regular expressions).

Your grammar (for the loops) is something along these lines (pseudo code syntax kind of like Bison, and I'm probably forgetting parts/leaving things out on purpose):

INDEXED_CONFIG_LINES: INDEXED_CONFIG_LINE | INDEXED_CONFIG_LINES INDEXED_CONFIG_LINE;
INDEXED_CONFIG_LINE: STRING OPEN_BRACKET STRING CLOSE_BRACKET EQUAL STRING OPEN_BRACKET STRING CLOSE_BRACKET;
LOOP: START_LOOP NUMBER LOOP_BODY END_LOOP;
LOOP_BODY: INDEXED_CONFIG_LINES | LOOP;

So instead of a regular expression, you need a parser that can use that grammar to build a syntax tree. You would basically just be building a state machine, where you transition on the next token to some state (like in a loop body, etc.).

Honestly, YAML would probably meet your needs instead of re-inventing the wheel or resorting to regex gymnastics. But if you really need to have the loop syntax you are proposing, you could take a look at the Symfony Yaml component to see how they do the parsing. https://github.com/symfony/Yaml

Or you can take a look at Twig for another parser that does have loops: https://github.com/fabpot/Twig/tree/master/lib/Twig

Matt
  • 5,478
  • 9
  • 56
  • 95
  • For an approach to writing parsers, see http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 – Ira Baxter Aug 03 '14 at 18:37
0

I find that when I have a whole bunch of variables that are related (like it seems you do), arrays are the way to go. Then you can skip the recursion and the parsing. Ex:

$cars=array("A","B","C");
echo $cars[0]; // echos "A"

Don't knock me for suggesting it, but couldn't you use an array in your config file? It'd be wayyy easier to parse too...

Josh T
  • 564
  • 3
  • 12
  • This surely is right, but what I want to create is not for personal use only which means php-unaware users will have to use it. Sorry for not mentioning it. – iceteea Oct 30 '13 at 22:10
  • @iceteea so your solution is to create a custom language that non-PHP users will need to use? That alone doesn't seem like a good reason. – Matt Oct 31 '13 at 00:19