-5

I want to find a string within a string, here's my scenario:

String toMatch = "ABC";
String matchIn = "ABC*FED";

Other variations of matchIn:

matchIn = "ABC";
matchIn = "ASD*ABC";
matchIn = "JULY*ABC*RTEW";

I have come-up with this regex but it obviously doesn't work:

matchIn.matches(".*(\\*)?" + toMatch + "(\\*)?.*");

The problem here is that I don't know how to look for the "*" only when it's followed by another word. This way it's just matching everything, e.g., toMatch="ABCDEF" returns true when it shouldn't!

TylerH
  • 20,799
  • 66
  • 75
  • 101
nullpointer
  • 490
  • 2
  • 4
  • 20
  • Why ABC isn't sufficient ? I Don't understand what you exactly want to find with the *. You are looking for ABC or more ? – user43968 Aug 13 '18 at 19:14
  • 1
    If you are just searching for a String inside a String, then [String.indexOf(String)](https://docs.oracle.com/javase/10/docs/api/java/lang/String.html#indexOf(java.lang.String)) is enough. – Arnaud Denoyelle Aug 13 '18 at 19:17
  • Because ABC is different than ABCDEF, '*' is a delimiter. – nullpointer Aug 13 '18 at 19:22
  • The answers at https://stackoverflow.com/questions/2631010/a-regex-to-match-a-substring-that-isnt-followed-by-a-certain-other-substring but without the `?!` will probably be useful. – TylerH Aug 13 '18 at 19:24
  • Six downvotes for a valid SO question with MCVE and clear attempt at solving the issue is a shame. [*The point Jamie was trying to make: not that regular expressions are evil, per se, but that *overuse* of regular expressions is evil*](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems). – Wiktor Stribiżew Aug 14 '18 at 07:07
  • @Wiktor Yes, this is ridiculous and a shame! – nullpointer Aug 15 '18 at 15:42

1 Answers1

1

You may use a regex like

(?<=^|\*)ABC(?=$|\*)

Or

(?<![^*])ABC(?![^*])

See the regex demo.

Details

  • (?<=^|\*) - a positive lookbehind that requires the position at the start of the string (^) or (|) a * symbol to appear immediately to the left of the current location (note that (?<![^*]) is an equivalent negative lookbehind construction, it matches any location that is not immediately preceded with any char but *, so it means just the same as (?<=^|\*))
  • ABC - a literal string pattern (ABC)
  • (?=$|\*) - a positive lookahead that, immediately to the right of the current location (that is, right after ABC), requires the end of string ($) or a * char (it is an equivalent of the negative lookahead (?<![^*])ABC(?![^*])).

Note that the variation with the negative lookbehinds is more efficient since there is no alternation inside these lookarounds (it costs more than with it).

Use with .find() to check for partial matches (a regex with .* is too inefficient):

List<String> strs = Arrays.asList("ABC", "ASD*ABC", "JULY*ABC*RTEW", "ASDABC");
Pattern p = Pattern.compile("(?<=^|\\*)ABC(?=$|\\*)");
for (String str : strs) {
    Matcher m = p.matcher(str);
    System.out.println("\"" + str + "\" => " + m.find());
}

Output:

"ABC" => true
"ASD*ABC" => true
"JULY*ABC*RTEW" => true
"ASDABC" => false

See the Java demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you, do you know why doesn't it work with String.matches? – nullpointer Aug 13 '18 at 19:22
  • 1
    @nullpointer Because `String#matches()` requires a full string match. Do not use it if you seek partial matches. – Wiktor Stribiżew Aug 13 '18 at 19:23
  • The ^ means start of string and $ end of string, is that correct? I don't understand what does ?, < and = accomplish, can you please explain. – nullpointer Aug 13 '18 at 19:27
  • @nullpointer sorry, I was asked by another user some questions, I have added the details now. – Wiktor Stribiżew Aug 13 '18 at 19:29
  • @nullpointer Note : if I understand correctly what you want, `str.startsWith("ABC") || str.indexOf("*ABC") > 0` also does the job and is more readable. It depends if you ask the question in order to learn about regular expressions or if you intend to use the code in production. – Arnaud Denoyelle Aug 13 '18 at 19:31
  • @ArnaudDenoyelle That is true that there might be other ways to solve the task. However, when it comes to regex, users usually "simplify" the sample texts, and probably, instead of `ABC`, there may be a much more complex pattern. – Wiktor Stribiżew Aug 13 '18 at 19:35
  • @ArnaudDenoyelle then i would also have to add a "ABC*" condition, so instead of having three conditions wouldn't a regex be better? – nullpointer Aug 13 '18 at 19:37
  • @nullpointer depends on your context. If you work with senior colleagues only or if performance *really* matters, opt for a regex. If your colleagues have an average standard, opt for multiple conditions as it will be easier for maintainability. 6 months later, it will be hard to remember how it works. Whatever you chose, you should cover this piece of code with unit tests (always do it with pieces of code which involve regex). – Arnaud Denoyelle Aug 13 '18 at 19:43