1

I'm trying to make a regex that would produce the following results :

  • for 7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3 : 7.0, 5, :asc, 8.256, :b, 2, :d, 3
  • for -+*-/^^ )ç@ : nothing

It's should first match numbers which can be float, so in my regex I have : [0-9]+(\\.[0-9])? but it should also mach special cases like :a or :Abc.

To be more precise, it should (if possible) match anything but mathematical operators /*+^- and parentheses.

So here is my final regex : ([0-9]+(\\.[0-9])?)|(:[a-zA-Z]+) but it's not working because matcher.groupCount() returns 3 for both of the examples I gave.

Mickäel A.
  • 9,012
  • 5
  • 54
  • 71
  • 1
    Are you trying to make a mathematical expression parser? If yes, *do not use regexes for this* – caiosm1005 Jan 04 '13 at 21:46
  • 2
    You have 3 groups in your pattern, so `matcher.groupCount()` will always return 3 (No matter what the input is). Read the [docs of the methods you are using](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#groupCount()). – jlordo Jan 04 '13 at 21:48
  • @caiosm1005 Yes that's right. Can you explain why please ? – Mickäel A. Jan 04 '13 at 22:32
  • @miNde [Please see this question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) - it's essentially for the same reason. Mathematical expressions aren't regular. Parsing a string requires a few steps, including tokenization and token analysis. If you'd like to go deep into it, see *[the overview of a parsing process](http://en.wikipedia.org/wiki/Parser#Overview_of_process)*. – caiosm1005 Jan 04 '13 at 23:25
  • @miNde I also tried to make what you're doing with regexes once, so, talking by experience, relying solely on regexes and a few exception rules becomes a mess so big you won't be able to fully solve in the end. – caiosm1005 Jan 04 '13 at 23:26
  • @caiosm1005 Ok I will look at these links, thx. – Mickäel A. Jan 05 '13 at 12:45

3 Answers3

3

Groups are what you specifically group in the regex. Anything surrounded in parentheses is a group. (Hello) World has 1 group, Hello. What you need to be doing is finding all the matches.

In your code ([0-9]+(\\.[0-9])?)|(:[a-zA-Z]+), 3 sets of parentheses can be seen. This is why you will always be given 3 groups in every match.

Your code works fine as it is, here is an example:

String text = "7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3";

Pattern p = Pattern.compile("([0-9]+(\\.[0-9]+)?)|(:[a-zA-Z]+)");
Matcher m = p.matcher(text);

List<String> matches = new ArrayList<String>();
while (m.find()) matches.add(m.group());

for (String match : matches) System.out.println(match);

The ArrayList matches will contain all of the matches that your regex finds. The only change I made was add a + after the second [0-9]. Here is the output:

7.0
5
:asc
8.256
:b
2
:d
3

Here is some more information about groups in java.

Does that help?

jackcogdill
  • 4,900
  • 3
  • 30
  • 48
  • I am not the downvoter. But why did you change his regex? It works just fine. – jlordo Jan 04 '13 at 22:09
  • Omg that was just that... thank you. So I have no choice but to loop with find() to get the number of groups in the expression that matched the regex ? I wonder why there isn't something like a size() method. – Mickäel A. Jan 04 '13 at 22:36
  • No problem! ^_^ Yeah, in Python you can create a list simply by `re.findall()`. I'm not sure why you have to loop it in Java, but it's not that big of a deal i guess. – jackcogdill Jan 04 '13 at 22:48
0
([^\()+\-*\s])+ //put any mathematical operator inside square bracket
slier
  • 6,511
  • 6
  • 36
  • 55
  • Have you tried it? Running this regex against OP's input does not produce his desired output. – jlordo Jan 04 '13 at 22:04
  • 1
    it still matches `^`. That's not in his desired output. – jlordo Jan 04 '13 at 22:08
  • @jlordo op stated match `anything` except mathematical operator..but hey i add `\s` to exclude whitespace – slier Jan 04 '13 at 22:10
  • 1
    He didn't express that clear enough, but his example shows. Your regex still matches `^` wich the OP interprets as the mathematical power operator. – jlordo Jan 04 '13 at 22:12
  • funny, it should not match ^. ^ means "not" inside []. And no, I did not test it. –  Jan 09 '13 at 18:31
0

Your regex is correct, run the following code:

    String input = "7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3"; // your input
    String regex = "(\\d+(\\.\\d+)?)|(:[a-z-A-Z]+)"; // exactly yours.
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);
    while (matcher.find()) {
        System.out.println(matcher.group());
    }

Your problem is the understanding of the method matcher.groupCount(). JavaDoc clearly says

Returns the number of capturing groups in this matcher's pattern.

jlordo
  • 37,490
  • 6
  • 58
  • 83