5

To clarify, I want to know how to use a regex to match:

ab
aabb
aaabbb 
...

I just found out that this works in Perl:

if ($exp =~ /^(a(?1)?b)$/)

To understand this, look at the string as though it grows from the outside-in, not left-right:

ab
a(ab)b
aa(ab)bb

(?1) is a reference to the outer set of parentheses. We need the ? afterwards for the last case (going from outside in), nothing is left and ? means 0 or 1 of the preceding expression (so it essentially acts as our base case).

My questions is: What is the equivalent of (?1) in Java?

Steve P.
  • 14,489
  • 8
  • 42
  • 72
  • Why not just check whether the length of the string is even and, if so, whether each half of the string consists of a single repeated character? –  Jun 11 '13 at 20:58

4 Answers4

5

In general, regular expressions are limited to regular languages, which means that since they are equivalent to languages acceptable by DFAs (discrete finite automata), they can't count, as generalized counting requires an infinite number of states to represent the infinite number of possible counted values.

Since discrete != infinte, you can't really count, but you can do some limited types of matching like in your (a (something) b) example.

Discusses a few limitations of DFAs (and Regular Languages / Regular Expressions by extension) http://www.cs.washington.edu/education/courses/cse599/99sp/admin/Slides/Week2/sld012.htm

A better, but more verbose slide that expands on the limitations by going into DFAs in (still a bit high level) detail http://www.cs.princeton.edu/courses/archive/spr05/cos126/lectures/18.pdf

By the way, the inside-out expansion is a cool DFA trick to basically side-step the actual need to count by using pattern matching that happens to rebuild the mirror image of the string. It looks like counting, but will fall apart as soon as you do more interesting things, like require matching in order (as opposed to mirrored order matching).

Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
1

You can use the Pattern and Matcher classes to count the number of occurrences of one of your two characters, then enforce this number in your regular expression using the {n} syntax.

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        String  s       = "aaaabbbb";
        Pattern pattern = Pattern.compile("a");
        Matcher matcher = pattern.matcher(s);

        // count all 'a's
        int count = 0;
        while (matcher.find())
            count++;

        // now enforce count matches for a and b in your regular expression
        String rExp = String.format("a{%d}b{%d}", count, count);
        Pattern matchSameCount = Pattern.compile(rExp);

        Matcher m2 = matchSameCount.matcher(s);
        System.out.println( m2.matches()); 
        // prints true
    }
}

Overall it is a little more work, but it's the only way I can think of that actually works at the moment.

Hunter McMillen
  • 59,865
  • 24
  • 119
  • 170
0

That language is not regular (see explanation below). So, you can't match that kind of language with regular expressions.

You are going to need an stack to hold repeated words.

I recommend you reading the following links:

NFA: http://en.wikipedia.org/wiki/Nondeterministic_finite_automaton Explanation of why this language is not regular: Why is {a^nb^n | n >= 0} not regular?

Community
  • 1
  • 1
ejoncas
  • 329
  • 2
  • 13
  • 4
    While true that what OP is describing is not a regular language, the question is about whether Java has a feature that works similar to Perl allowing backreferences within the group they refer to. – Reinstate Monica -- notmaynard Jun 11 '13 at 21:05
-5

Not being a Java developer, but plenty of experience in Perl, why not just /^a+b+$/ ?

I'm wondering if I've misunderstood your Q!

;-)

Simon Catlin
  • 2,141
  • 1
  • 13
  • 15