0

This is the regex which is used to find block comments and it works absolutely fine

/\\*(?>(?:(?>[^*]+)|\\*(?!/))*)\\*/

I just need to modify it a little bit. Find a semi-colon (;) that "may" exists in the block comments and replace it with a white space.

Currently I am doing this

while (m.find()) {
    if (m.group().contains(";")) {
        replacement = m.group().replaceAll(";", "");
        m.appendReplacement(sb, replacement);
    }
}
m.appendTail(sb);

But I need to replace it with a str.replaceAll kind of statement. In short anything that is more efficient because I get out of memory exception. I fixed a few other regex that used to throw same exception and they are working fine. I hope this regex can also be optimized.

--- Edit ---

These are the string you can test this regex on

/* this* is a ;*comment ; */

/* This ; is* 
another
;*block
comment;
;*/

Thanks

Ali
  • 7,810
  • 12
  • 42
  • 65
  • I might write you the Regex if you can post an example phrase and then highlight what you want from it. – Mikhail Nov 25 '11 at 10:07
  • @Misha Please view the edited question. I have provided a sample String. Thanks – Ali Nov 25 '11 at 11:07

2 Answers2

3

It'l be much simper to use (?s)/\*.+?\*/ regexp. In your expression you use negative lookahead that "eat" your memory. And your code may be simpler:

while (m.find()) {
    m.appendReplacement(sb, m.group().replace(";","");
}
m.appendTail(sb);
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
temple.t
  • 39
  • 2
  • +1 for clearing away the clutter (especially that `m.group().contains(";")` call), but his regex is optimized for efficiency, so I wouldn't change that. It's not how *I* would have written it, but it should perform significantly better than `/\*.+?\*/`. – Alan Moore Nov 25 '11 at 12:04
  • I'm not sure about better efficiency. Lookahead could call problems with memory, especially with large text at entrance. Atomic group can't solve this at all. – temple.t Nov 25 '11 at 12:29
  • It's only looking ahead for one character, and it only does that after it sees an asterisk. Performance-wise, lookaheads don't cause nearly as many problems as do alternations and quantifiers with overlapping effects. That's what the atomic groups are there for. – Alan Moore Nov 25 '11 at 13:03
  • @AlanMoore Can you explain this a little bit "alternations and quantifiers with overlapping effects" .. Examples might be helpful. Thanks – Ali Nov 30 '11 at 06:13
  • @Ali: [This question](http://stackoverflow.com/q/2407870/20938) demonstrateshow an alternation can bog down a match when two or more alternatives are capable of matching the same characters. As for the quantifiers, see [this](http://www.regular-expressions.info/catastrophic.html). – Alan Moore Nov 30 '11 at 23:10
0

There are two variants(try both):

1). Why are you using ?>? I don't know what it means and I don't see a need to use something special here like ?>. Change it to ?:.

2). Your loop is infinite. You need this:

    int index = 0;
    while (m.find(index)) {
        if (m.group().contains(";")) {
            replacement = m.group().replaceAll(";", "");
            m.appendReplacement(sb, replacement);
        }
        index = m.end();
    }
itun
  • 3,439
  • 12
  • 51
  • 75
  • 1
    `(?>...)` is an [atomic group](http://www.regular-expressions.info/atomic.html), and it's necessary for maximum efficiency ([possessive quantifiers](http://www.regular-expressions.info/possessive.html) would work, too). And it's not an infinite loop; the `find()` method keeps track of the match-start position by itself. `find(int)` is for special cases. – Alan Moore Nov 25 '11 at 11:51