2

I am creating a program that enables the user to create some sort of script. I compile his script at runtime and then executed. I am doing something like: https://stackoverflow.com/a/4181855/637142

Anyways to make the long story short basically I have to replace all the variables that start with $. for something that will make the script compile. If the user has the following line:

var x = ($MyArray[ 4 ].Size) + 3;

what regex will enable me to select $MyArray[ 4 ].Size ?

If the user where to write:

var x = $SomeVar;

In that case it would be easy to find SomeVar . I am having trouble finding variables that start with $

Edit

I think I am close on finding the solution. Right now I am replacing the $ with the word Foo. in other words I replced the line:

var x = ($MyArray[ 4 ].Size) + 3;

for

dynamic Foo; // then
var x = (Foo.MyArray[ 4 ].Size) + 3;

Now it compiles but I would still need to find Foo.MyArray[ 4 ].Size

Edit 2

I am not trying to create a compiler I just need to replace some variables (the ones that start with $) nothing more ;)

Community
  • 1
  • 1
Tono Nam
  • 34,064
  • 78
  • 298
  • 470
  • Why do you need to find `Foo.MyArray[ 4 ].Size` versus `Foo`? They should both accomplish replacing `$`. – Guvante Aug 14 '13 at 19:57
  • 13
    RegEx is not the right tool for the job here. Regular expressions can only parse regular languages, C# (and most other programming languages) are Context Free Languages and require a different model for parsing which is based on the Push Down Automaton. If you google these terms you can read about the theory behind it. But basically, I guarantee you there is no RegEx that will be sufficient. – evanmcdonnal Aug 14 '13 at 19:58
  • Yeah I belive I am ok at regexes and it is almost impossible to create it. – Tono Nam Aug 14 '13 at 20:01
  • Are you sure that hosting a Powershell instance in your C# app cannot be a better option? A user then can use your C# classes and instances directly. – Mark Toman Aug 14 '13 at 20:18
  • Can't you just use `\$[A-Za-z][A-Za-z0-9]*` – Icemanind Aug 14 '13 at 20:25
  • 2
    And how are you going to handle something like a string that contains `"This is $NotAVariable"`? Or the same thing in a comment? I think you're underestimating the complexity of the problem. – Jim Mischel Aug 14 '13 at 20:26
  • @JimMischel - I think everyone else is over complicating it. The OP specifically says `"I am not trying to create a compiler I just need to replace some variables (the ones that start with $) nothing more"` – Icemanind Aug 14 '13 at 20:28
  • @icemanind - But does the OP consider it ok to accidentally alter parts of string literals as well (as pointed out by JimMischel)? Or did the OP not consider all of the pitfalls? We won't know until the OP responds to JimMischel's latest comment. – mbeckish Aug 14 '13 at 20:40
  • I know doing a regex is the wrong approach. Next time I go with the answer that I approved. – Tono Nam Aug 14 '13 at 20:55
  • 1
    @icemanind: My point was that a regular expression approach would do more than just rename variables. It would also rename values in string literals and in comments. – Jim Mischel Aug 14 '13 at 22:01

3 Answers3

2

It sounds like you are attempting to use regular expressions to convert a scripting language into a different language. Doing this correctly will require a lot more than what a regular expression can manage.

I would highly recommend using an existing parsing system to manage your scripting for you, as they will have considered many of these kinds of externalities and created a mature language to work in. Lua is a common choice for instance.

Alternatively you will likely want to write a proper parser to handle transforming the original source code into a tree that can then be walked to generate the resultant code.

Guvante
  • 18,775
  • 1
  • 33
  • 64
1

I'm not sure I understand the question fully.

You're trying to make a compiler? It's not that easy my friend. Compilers generally go through 5 stages:

  • Lexical Analysis
  • Parsing
  • Semantic Analysis
  • Optimization
  • Code Generation

It looks to me like you're trying to accomplish the lexical analysis stage. If so, there are many programs you can use to accomplish this task. One of these tools is called C# LEX. It's a great tool for generating programs that analyze your code and spit out tokens.

Here is an example of a LEX script:

    %%

    ALPHABET = [a-zA-Z]*

    %%

    <YYINITIAL>{ALPHABET} { return ("STRING"); }

C# LEX takes this .lex file and parses it. It then produces a C# program which can analyze any script that fits the language specified by your lexer.

Thanks,

  • Ro
Rohan
  • 359
  • 2
  • 16
0

Managed to do the regex after spending to much time.

So if I have the code:

if (0 == $Foo.Arr1[7 + Arr2_5[3-1] ].More[8].Yes-9) {
  // do something
} 

In that case I will want to match $Foo.Arr1[7 + Arr2_5[3-1] ].More[8].Yes

The following regex will match that:

(?xi)
(?>
      (?<Q> "      )    # quote
    | (?<C> //     )    # comment
    | (?<N> [^"\$] )    # nothing
    | (?<D> \$     )    # dollar sign
)
(?(Q) .+? (\r|\n|$|") ) # if a quote is match continue selecting until you find next quote
(?(C) .+? (\r|\n|$)   ) # if a comment // or /* is match continue selecting until end of line
(?(N) (?=(?!))        )
(?(D)                   # if dollar sign is matched then:
  (

    (?(B)(?<-B>)){15}  # make sure group has a count of 0

    (?<V> [a-z|_]     ) # variable must start with letter of under score
    (?>
            (?<B>   \[        )                   # match [
       |    (       [^\s\)\(+\-@\#$%\^&\*=`~,\\\|\[\]] )  # or anything that is not a space, +, @ etc..
    )*
    (?(B)                                         # if you match a bracket then:
        (
           (?>
                    [^\[\]]
                 |    \[ (?<numberP>)                # balance match until opening ( = closing )
                 |    \] (?<-numberP>)
            )*
            \]
            (?(numberP)(?!))
            \.? 
        )
    )  # basically if you match a bracket keep selecting until you find even number of closing ]
  )
  +   # repeat this as many times as you can
)
(?(Q)(?!))   # Make regex fail if any of the following cases
(?(C)(?!))
(?(N)(?!))
Tono Nam
  • 34,064
  • 78
  • 298
  • 470