0

I saw a SO question on here that uses Java Matcher and Pattern in an attempt to highlight text, similar to how Regex101 does it highlighting. He's specification was to highlight in a JTextArea for any literal string that is not preceded by a the literal character '#'. I was going to suggest creating your own Matcher and then the OP deleted his question :(

That was the background, now here is my question. How can I use a regular expression to grab a literal string unless it's after (but not necessary adjacently) a specific string/character in a line?

Example, if I wanted to select the string "tester" from the following

tester, #tester

test tester # test tester

tester

I would hope my regex would select

tester, #tester

test tester # test tester

tester

but not the last "tester".

Using Regex101, the closest I got was /(?=tester)(?<!#)tester/g but this selects the last "tester" string since I cannot do a "dynamic?" (non-zero) length look back, as far as I can tell.

EDIT:

My question was not Java specific, otherwise I would of placed the Java tag. Unless Regex101 is wrong, I cannot use a Limiting Repetition because "Lookbehinds need to be zero-width, thus quantifiers are not allowed".

I tested WiktorStribiżew regex in Java, and it works fine. Seeing it was a comment and not an answer, All I can do is +1 it, the Java String is (?<!#.{0,1000})\\btester\\b. I tested it against the following Java String tester, #tester\ntest tester # test testern\tester

Side question, there is no fully defined way to handle regex across all languages? Or is is Regex101 just a poor testing tool (I was using their default, PHP engine)?

I'll consider using RegexStorm or RegexHero in the future.

Community
  • 1
  • 1
SGM1
  • 968
  • 2
  • 12
  • 23
  • https://regex101.com/r/aT3qN3/6 – Shafizadeh Apr 15 '16 at 19:57
  • Then it has to be only one character which is unique in that string. –  Apr 15 '16 at 20:01
  • Why can't you simply test up to your desired character? /(?=tester)#?/g – Alden Be Apr 15 '16 at 20:02
  • I think you can use [`\btester\b(?=.*#)`](https://regex101.com/r/wV7fX9/1). – Wiktor Stribiżew Apr 15 '16 at 20:04
  • @WiktorStribiżew: What if the string is `test tester # test tester # tester` ? ;-) –  Apr 15 '16 at 20:06
  • @noob: That is a question to OP. I'd also like to see an example with a *specific string*. – Wiktor Stribiżew Apr 15 '16 at 20:08
  • 2
    I believe SO should append a new option into flag window: *"OP is unseen and he don't answer the comments"* – Shafizadeh Apr 15 '16 at 20:22
  • @Shafizadeh Sorry forget a caveat, also lines without the string "#" – SGM1 Apr 15 '16 at 20:26
  • @Shafizadeh You know that would be terrible on SO, not every has time to stay on SO until a reply comes through, that the hope point of a forum to post and come back later.... – SGM1 Apr 15 '16 at 20:28
  • @WiktorStribiżew Sorry updated post due to a small caveat of '#' not being present at all – SGM1 Apr 15 '16 at 20:29
  • 1
    You can use [`String rx = "(?<!#.{0,1000})\\btester\\b";`](http://ideone.com/kgDsB4) – Wiktor Stribiżew Apr 15 '16 at 20:44
  • @WiktorStribiżew I confirmed that does work in the Java world, for some reason [https://regex101.com/](https://regex101.com/) does not like the {0,1000}, any idea why? – SGM1 Apr 21 '16 at 14:25
  • Because [regex101](http://regex101.com) does not support regex flavors (like Java or ICU) that feature a constrained-width lookbehind. Use a .NET based online testers, like [RegexStorm](http://regexstorm.net/tester) or [RegexHero](http://regexhero.net/tester). Or just use Java supporting ones: [RegexPlanet](http://www.regexplanet.com/advanced/java/index.html) or [ocpsoft](http://ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/). – Wiktor Stribiżew Apr 21 '16 at 14:30
  • Hey, want me to post an answer - I will. – Wiktor Stribiżew Apr 21 '16 at 14:52
  • @WiktorStribiżew You'll get a +1 from me, might accept in a few days if no one can supply a more universally accepted regex, meaning one even Regex101 will accept – SGM1 Apr 21 '16 at 14:55
  • @WiktorStribiżew I purposely left out Java as a tag – SGM1 Apr 21 '16 at 14:56
  • If you leave off the Java tag, it is just unclear. Each regex engine has its own perks, and what you can do with PCRE (recurse subpatterns), you cannot repeat with .NET/Python `re`. If you want a Java tailored solution, you have to add Java tag. A generic solution is to match what you do not need, and match and capture what you need. – Wiktor Stribiżew Apr 21 '16 at 15:00

3 Answers3

1

In Java, you can leverage a constrained-width lookbehind that is handy if the number of characters before the expected substring is not infinite. It means you can use a limiting quantifier inside the lookbehind. (There is a bug that allows using * in Java 8, but it is not a good idea to exploit it since in further versions the bug may be fixed.) Just note that with bigger values inside the limiting quantifier the performance may drop.

So, you can use

String rx = "(?<!#.{0,1000})\\btester\\b";

See the IDEONE demo

The pattern matches any whole word tester (as \b is a word boundary) that is not preceded with a # followed with 0 to 1000 any characters but a newline (with DOTALL, it will match newlines, too).

NOTE ON THE ONLINE TESTERS: Because regex101 does not support regex flavors (like Java or ICU) that feature a constrained-width lookbehind. Use a .NET based online testers, like RegexStorm or RegexHero. Or just use the best Java regex online testers: RegexPlanet or ocpsoft.


Now, speaking about a generic solution: Match what you do not need, and match and capture what you need to keep..

This is the pattern:

#.*\btester\b|\b(tester)\b

Notice that the green-highlighted testers are those that reside in capture group #1, and those in Group 0 are in blue at regex101. You can check which group these subvalues belong to, and take appropriate action in your code.

In Java, to check if a group matched, just use

if (match.group(1) != null) { 
    /* Group 1 matched, the tester we need is here */
}
else {  
    /* No action, this tester is preceded with # */ 
}
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

You can use an optional group before tester that starts with #. And then check for presence of first group and replace accordingly.

String text = "tester, #tester\ntester foo\ntest tester # test tester\ntester";
Pattern p = Pattern.compile( "(#[^#\n]*)?(\\btester\\b)" );
Matcher m = p.matcher( text );

StringBuffer sb = new StringBuffer();
while(m.find()) {
    if (m.group(1) == null)
        m.appendReplacement(sb, "<em>" + m.group(2) + "</em>");
    else
        m.appendReplacement(sb, m.group());
}
m.appendTail(sb);
System.err.println(sb);

Output:

<em>tester</em>, #tester
<em>tester</em> foo
test <em>tester</em> # test tester
<em>tester</em>
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

While I originally thought that this was more about highlighting matches in Java, this code I found here may solve all your problems. Changed slightly to match your example:

  JTextArea textArea = new JTextArea(10, 30);

  String text = "test tester # test tester";

  textArea.setText(text);

  Highlighter highlighter = textArea.getHighlighter();
  HighlightPainter painter = 
         new DefaultHighlighter.DefaultHighlightPainter(Color.pink);
  int p0 = text.indexOf("tester");
  int p1 = p0 + "tester".length();
  highlighter.addHighlight(p0, p1, painter );

  JOptionPane.showMessageDialog(null, new JScrollPane(textArea));

If you only apply the highlighting when p0==0 or text.charAt(p0-1) != '#' you wouldn't need a regex. (Or when p0 < text.indexOf("#"), I'm not sure what you want exactly.)

Community
  • 1
  • 1
Laurel
  • 5,965
  • 14
  • 31
  • 57