I'm trying to write a regex that finds all variables (and only variables, ignoring methods completely) in a given piece of JavaScript code. The actual code (the one which executes regex) is written in Java.
For now, I've got something like this:
Matcher matcher=Pattern.compile(".*?([a-z]+\\w*?).*?").matcher(string);
while(matcher.find()) {
System.out.println(matcher.group(1));
}
So, when value of "string" is variable*func()*20
printout is:
variable
func
Which is not what I want. The simple negation of (
won't do, because it makes regex catch unnecessary characters or cuts them off, but still functions are captured. For now, I have the following code:
Matcher matcher=Pattern.compile(".*?(([a-z]+\\w*)(\\(?)).*?").matcher(formula);
while(matcher.find()) {
if(matcher.group(3).isEmpty()) {
System.out.println(matcher.group(2));
}
}
It works, the printout is correct, but I don't like the additional check. Any ideas? Please?
EDIT (2011-04-12):
Thank you for all answers. There were questions, why would I need something like that. And you are right, in case of bigger, more complicated scripts, the only sane solution would be parsing them. In my case, however, this would be excessive. The scraps of JS I'm working on are intented to be simple formulas, something like (a+b)/2
. No comments, string literals, arrays, etc. Only variables and (probably) some built-in functions. I need variables list to check if they can be initalized and this point (and initialized at all). I realize that all of it can be done manually with RPN as well (which would be safer), but these formulas are going to be wrapped with bigger script and evaluated in web browser, so it's more convenient this way.
This may be a bit dirty, but it's assumed that whoever is writing these formulas (probably me, for most of the time), knows what is doing and is able to check if they are working correctly.
If anyone finds this question, wanting to do something similar, should now the risks/difficulties. I do, at least I hope so ;)