1

I have the following regex:

/(?:this\.(\w+)\(([\s\S]*?)\))/g

it is used to take code like this:

this.doSomething(foo, bar)

and replace it with:

this.lookup('doSomething', [foo, bar])

for that use use case (which is the most common) it works correctly, but it does not work if this is used within it like this:

this.doSomething(foo, bar, this.baz())

the result incorrectly is this:

this.lookup('doSomething', [foo, bar, this.baz(]))

it should be this:

this.lookup('doSomething', [foo, bar, this.baz()])

Well that's the first problem. It should actually be transformed just like this.doSomething, so the final result really should be:

this.lookup('doSomething', [foo, bar, this.lookup('baz', [])]);

Basically my regex is assuming the closing parenthesis from this.baz() is the closing parenthesis of this.doSomething() and also doesn't operate recursively. I need some sort of recursive behavior/control here.

I've heard of xregexp but I'm not sure how that can help me. It also seems like a true language parser may be the only way to go. I don't have much experience there, but I'm not afraid of getting my hands dirty. It seems tools like Esprima could help?

At the end of the day I'm looking to make minor language/syntax changes in the build step of my code, i.e. exactly like Babel does. I'm in fact using Babel. Maybe some sort of Babel plugin is an option?

Anyway, I'm open to both a quickfix regex trick or more pro/robust language parsing techniques. I'm also just curious how such problems are generally approached. Scanning over the entire input and matching open/closing braces/parentheses/etc I assume??

faceyspacey.com
  • 2,510
  • 2
  • 20
  • 29
  • 3
    _"It also seems like a true language parser may be the only way to go"_ Yes, most likely. For recursive things, language parsing is just a better fit than regex, which is quite limited. – sidyll Dec 05 '15 at 01:13
  • i dont suppose you have any recommendations of where to start? – faceyspacey.com Dec 05 '15 at 01:23
  • Well, this site is more to the Q&A style instead of broader recommendations. But if you're interested, research about lex & yacc and the GNU versions flex & bison. Also [this book](http://www.amazon.com/flex-bison-Text-Processing-Tools/dp/0596155972) on the GNU tools is very nice and I learned a lot from it. It is really a giant subject, lexical analysis but if you are willing to dig in it, and have the aspiration, go for it. Knowledge does not occupy space and it might be really useful for your future projects and career. – sidyll Dec 05 '15 at 01:27
  • gotchu on that, but there also must be some straightforward things I can do by modifying babel's jsx parser (note: im applying my regex/replace only to jsx code). – faceyspacey.com Dec 05 '15 at 01:43
  • Yes, surely. Sorry I don't really understand about those, but I'm sure someone else here will give good pointers. Best wishes! – sidyll Dec 05 '15 at 01:44
  • 2
    Are you looking to do this as part of some build step, or do you want to process these files and write them back onto the disk. – loganfsmyth Dec 05 '15 at 06:18
  • 1
    @faceyspacey.com: There is no recursion or balanced group support in JS regex, so, you should either adapt some existing solutions ([here is my example](http://stackoverflow.com/a/31996977/3832970) and [another one](http://stackoverflow.com/questions/31989619/get-string-between-2-words-that-contain-this-words-inside-him-too/31989913#31989913)) or write your own parser. – Wiktor Stribiżew Dec 05 '15 at 09:04
  • @loganfsmyth yea as part of a build step. – faceyspacey.com Dec 05 '15 at 09:37

1 Answers1

2

Here's an example of how you might do this with a Babel plugin:

var names = ['doSomething', 'baz'];

module.exports = function(context){
    var t = context.types;

    return {
        visitor: {
            CallExpression: function(path){
                var callee = path.get('callee');
                // Only process "this.*()" calls.
                if (!callee.isMemberExpression() ||
                    !callee.get('object').isThisExpression() ||
                    !callee.get('property').isIdentifier()) return;

                // Make sure the call is to one of your specific functions.
                if (names.indexOf(path.node.callee.property.name) === -1) return;

                // Build "this.lookup('<name>', [])".
                path.replaceWith(t.callExpression(
                    t.memberExpression(t.thisExpression(), t.identifier('lookup')),
                    [
                        t.stringLiteral(path.node.callee.property.name),
                        t.arrayExpression(path.node.arguments),
                    ]
                ));
            }
        }
    };
}

If you drop that into a plugin.js function for instance, you can create a .babelrc config file and make sure ./plugin.js or whatever path points to it, is in your plugins array, e.g.

.babelrc

{
  "presets": ['es2015'],
  "plugins": ['./plugin']
}
loganfsmyth
  • 156,129
  • 30
  • 331
  • 251
  • wow. awesome! ..exactly what I was looking for. don't know what else to say. Been looking into Babel plugins, but I couldn't find the right place to start until now. – faceyspacey.com Dec 10 '15 at 03:47