2

Hey guys I'm working with a custom scripting language and I am making a sort of IDE for this language in C#. In this language the functions are defined like this:

yourfunctionhere(possiblepararmhere)
{
  yourcodehere;
}

I've been trying to figure out the best way to get a list of all the functions via regex and couldn't find a working way to get a list of all the defined functions. Could somebody tell me of a better way or a way to do it with regex? Thanks a lot!

EDIT: Would something like this work in C#? %[a-z_0-9^[^]*]++ [a-z_0-9*^[^]]+[ ^t]++[a-z_0-9*^[^]]+[ ^t]++^([*a-z_0-9]+^)[ ^t]++([^p*&, ^t^[^]a-z_0-9./(!]++)[~;]

user556396
  • 597
  • 3
  • 12
  • 26
  • possible duplicate of [Can regular expressions be used to match nested patterns?](http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns) – jtbandes Jul 05 '11 at 02:13
  • 1
    If you are serious about your language, you should use a lexer/parser pair, not regex. – J-16 SDiZ Jul 05 '11 at 02:42
  • I may do this, thank you for the suggestion. – user556396 Jul 05 '11 at 02:54

3 Answers3

3

If you just want a list of function names something like this might work:

Regex.Matches(source,@"([a-zA-Z0-9]*)\s*\([^()]*\)\s*{").Cast<Match>()
    .Select (m => m.Groups[1].Captures[0].Value).ToArray()

Basically, that regex is looking for any group of alphanumeric characters, followed by optional white space, followed an open parenthesis, followed by zero or more non-parentheses, followed by a close parenthesis, followed by optional white space, and then an open curly brace.

Then from there you extract just the beginning portion, and create a list. Assuming the language does not otherwise allow a close parenthesis to be followed by an open curly bracket, then the above should work. Otherwise more details would be needed.

Kevin Cathcart
  • 9,838
  • 2
  • 36
  • 32
  • I should note that the above may catch commented out functions, and would catch anything that looks like a function defintion in a string. The best way to go is to have a parser. They are not really that hard to make especially if there exists formal documentation of the language in question. If the specification includes a grammar in BNF or something similar, the job becomes a pretty straightforward translation. – Kevin Cathcart Jul 06 '11 at 17:13
0

It'd be much easier if you changed your syntax by adding a reserved keyword like 'def', so your declarations become:

def yourfunctionhere(possiblepararmhere)
{
    yourcodehere;
}

Then you can use a simple regex like def [a-zA-Z0-9]+.

Petar Ivanov
  • 91,536
  • 11
  • 82
  • 95
  • I too was thinking this. That is the reason I was having such a difficult time. Unfortunately I do not have the ability to change the scripting language. – user556396 Jul 05 '11 at 02:16
  • Also that simple regex wouldn't work (think about "def ..." in a string literal). The dragon book (http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools) is the authority on this subject really. – Hut8 Jul 05 '11 at 02:55
-1
var pat=  @"\b(public|private|internal|protected)\s*" + @"\b(static|virtual|abstract)?\s*[a-zA-Z_]*(?<method>\s[a-zA-Z_]+\s*)" + @"\((([a-zA-Z_\[\]\<\>]*\s*[a-zA-Z_]*\s*)[,]?\s*)+\)" ;
4b0
  • 21,981
  • 30
  • 95
  • 142