19

I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.

I want to replace anything of the form

A->Draw(B1, B2)

with

MyFunc(A, B1, B2).

My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression

My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.

The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.

Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?

Mathieu Rodic
  • 6,637
  • 2
  • 43
  • 49
Eldritch Cheese
  • 1,177
  • 11
  • 21
  • 7
    Just to make absolutely sure you aren't doing an x-y problem here, can you tell us what you're trying to accomplish by doing the substitution? – Mark B Aug 04 '15 at 17:11
  • The C++ interpreter also has some functions that are available to the user. I want to add additional behavior to these functions, but they do not have any mechanism for adding hooks into them. Once I pass a string into the interpreter, I cannot regain control directly. Therefore, I want to modify the string before I pass it in, so that it will pass control back to me at the appropriate time. – Eldritch Cheese Aug 04 '15 at 17:48
  • Two questions: will the commands come each in a separate line? And is it possible to see repeated calls to `Draw` in the same line? (`A->Draw(...)->...->Draw(...)`)? – Alexander Feterman Aug 26 '15 at 18:18
  • There may be multiple commands in a single line, For example, `A->Draw(B); C->Draw(D)` or `func(A->Draw(B), C->Draw(D))`. Repeated calls of the pattern you showed will not happen, as `Draw` returns an integer value. – Eldritch Cheese Aug 28 '15 at 13:08
  • Can macro expansion happen in the evaluation of `A->Draw(B1, B2)` ? – serge-sans-paille Aug 31 '15 at 07:46
  • In principle, macro expansion could occur, and would occur after control leaves my function. In practice, I have not seen any macros in use here, and so I can safely ignore macro expansion. – Eldritch Cheese Aug 31 '15 at 13:26
  • You want to replace one *arbitrary* C++ expression (statement/block/...) with another? If not, what are the constraints on what can be replaced? Do you want to replace the entire "input" (delimited how?) or do you want to replace sub-elements of the input? If sub-elements, expressed how? If sub-elements, replace just one, replace all that match some library? After a replacement, can another replacement happen? How do you want to express the replacements (surface syntax? something else)? – Ira Baxter Dec 19 '15 at 17:44

6 Answers6

3

What you want is a Program Transformation System. These are tools that generally let you express changes to source code, written in source level patterns that essentially say:

 if you see *this*, replace it by *that*

but operating on Abstract Syntax Trees so the matching and replacement process is far more trustworthy than what you get with string hacking.

Such tools have to have parsers for the source language of interest. The source language being C++ makes this fairly difficult.

Clang sort of qualifies; after all it can parse C++. OP objects it cannot do so without all the environment context. To the extent that OP is typing (well-formed) program fragments (statements, etc,.) into the interpreter, Clang may [I don't have much experience with it myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.

GCC with Melt sort of qualifies in the same way that Clang does. I'm under the impression that Melt makes GCC at best a bit less intolerable for this kind of work. YMMV.

Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations on large scale C++ code bases.

DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.

DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text to feed to the interpreter.

Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his example, he can provide DMS with a rewrite rule (untested but pretty close to right):

rule replace_Draw(A:primary,B1:expression,B2:expression):
        primary->primary
    "\A->Draw(\B1, \B2)"     -- pattern
rewrites to
    "MyFunc(\A, \B1, \B2)";  -- replacement

and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.

If OP provides a set of such rules, DMS can be asked to apply the entire set.

So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
0

Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.

As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines

 class ItemWithDrawMehtod
 {
 ....
 public:
 #ifdef CATCHTHEMETHOD
     private:
 #endif
 void Draw(A,B);
 #ifdef CATCHTHEMETHOD
     public:
 #endif
 ....
 };

Then compile as:

 gcc -DCATCHTHEMETHOD=1 yourfilein.cpp
  • I'm afraid I'm not sure what benefit this would have. Since my program will be distributed to other users who will be compiling against an unmodified version of the library, modifying calls in the library itself to call `MyFunc` instead of `Draw` does not solve the issue. In addition, users are used to passing `Draw` commands into the interpreter, and it is those commands that I want to alter. – Eldritch Cheese Sep 16 '15 at 13:57
  • There is no real benefit this solution is not for production is for development. The point is produce a forced error and let the compiler to do the job for you. In the source code that is 'your responsability' you can use this trick to catch all errors and provide your function for others coders. But is theirs responsability to use the new function. – Kiko Albiol Colomer Sep 18 '15 at 13:33
0

In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.

Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.

Chand Priyankara
  • 6,739
  • 2
  • 40
  • 63
  • At the moment, there already is a scripting language, C++. Though C++ is generally not used as a scripting language, the library I am using includes a C++ interpreter. I can modify commands before they are passed into the interpreter, but I cannot modify the library code itself, as other users will be compiling against the unmodified version of the library. – Eldritch Cheese Sep 16 '15 at 13:59
0

What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.

Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.

You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.

Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.

hkBst
  • 2,818
  • 10
  • 29
0

There may be a way to accomplish this mostly with regular expressions.

Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.

Fundamentally, the part that matters is the "SYMBOL->Draw("

SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.

For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.

The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.

As soon as you have the symbol or expression, the replacement is trivial, going from

SYMBOL->Draw(...

to

YourFunction(SYMBOL, ...

without having to deal with the additional parameters to Draw().

As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as

A->Draw(B...)->Draw(C...)

The first iteration identifies the first A->Draw( and rewrites the whole statement as

YourFunction(A, B...)->Draw(C...)

which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as

YourFunction(YourFunction(A, B...), C...)

where B... and C... are well-formed C++ parameters, including nested calls.

Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.

Matt Jordan
  • 2,133
  • 9
  • 10
  • It seems to me that you are arguing that regexps can parse arbitrary C++. *It isn't true that regexps can parse any context free language*, let alone C++. You might be able to combine custom coding with regexps, but generally this way lies madness. I think your approach would be incredibly brittle. Have you actually implemented this idea somewhere successfully? – Ira Baxter Jan 28 '16 at 09:27
  • @Ira Baxter: Not necessarily. I said "This may be possible with regular expressions, but should be fairly easy in normal code as well." There is a difference between parsing code with the goal to compile it and parsing code with the goal to identify where an expression begins/ends. As for whether I have implemented it, I wrote a C++ Server Page compiler for my own use in under two hundred lines of C++ (converts to a standard C++ source that GCC or Visual Studio can compile with a pre-built project), using this approach, but not with regular expressions. It can be brittle, it depends. – Matt Jordan Jan 28 '16 at 16:12
  • To process C++ expressions, you have to pick up C++ lexemes, agreed? You can't reasonably propose that you can write a C++ lexer in 200 lines, let alone pick out subexpressions, so I don't see how you accomplished this accurately. Second, some things are parsed differently when interpreted as declarations vs. expressions; how does you scheme handle this? Finally, to replace expressions you have to find them at all levels in a statement; this *is* parsing because of nesting. I find your implied assertion that your C++ Server Page handles arbitrary C++ difficult to believe. – Ira Baxter Jan 29 '16 at 02:59
  • Writing a C++ parser that generates parse trees for a full compiler would be very difficult; however, writing a parser that can distinguish and extract C++ code in blocks is actually very easy. Without having to build a full parse tree, something like recursively walking the code is fairly easy, since C++ is very strict in what code can contain (e.g. Can a [ exist in C++ without a ]? Other than an escaped ", can anything else in a string affect skipping it?). The OP does not need code to rewrite declarations, only expressions, so ignoring declarations is enough for that counter-example. – Matt Jordan Feb 02 '16 at 01:32
  • My team has experience writing a full C++ front end; yes, it is hard, then you get to figure out the the types of subexpresssions. OP appears to be entering arbitrary C++ code elements into his interpreter. Expressions he does enter have types. Even if OP is entering trivial expressions, presumably his "replacements" have to be type compatible. So you need not only to parse, but to resolve names according to C++ rules. Why you insist this is easy is beyond me. – Ira Baxter Feb 02 '16 at 03:01
-1

One way is to load user code as a DLL, (something like plugins,) this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.

alirakiyan
  • 418
  • 4
  • 16
  • While this *is* a way to execute user code in an application, it is not what is desired in this case (user enters "live" code at runtime). – crashmstr Feb 16 '16 at 13:25