It can be done but it is quite complicated. Your regexp just scans for 'anything, then any number of stars, then anything - which matches everything. Your code returns true
for literally every string imaginable.
Your approach is problematic. Trying to positively match is quite complicated. The question is asking the negative: "This construct is invalid; any string that doesn't have an invalid construct is valid" is what it boils down to, with the invalid construct being: "A*B", where A and B are not identical. After all, *a
is valid (example 3). Presumably ***
is also valid (the first and last star aren't a problem because they don't have characters on both sides, the middle one is fine because the character on each side is identical).
Thus, what you want is to write a regexp that finds the invalid construct, and return the inverse.
To find the invalid construct you need something called the backreference: You want to search for a thing and then refer to it.
".\*."
- we start here: A character, a star, and a character. But now we need for the second character (the second .
to actually be: Something OTHER than the first character). After all, ".\*."
would also match on "A*A"
and that is a valid construct so we don't want it to match.
Enter backrefs. ()
in regexpese makes a 'group' - a thing you can refer to later. \1
is a backref - it's "whatever matched for the first set of parentheses".
But we need more - we need to negate: Match if NOT this. Within chargroups there's ^
- [^fo]
means: "Any character that is NOT an 'f' or an 'o'". But a backref isn't a chargroup.
As per this SO question backing me up on this, the only way is negative lookahead. Lookahead is a thing where you don't actually match characters, you merely check if they WOULD match, and if it would, fail the match. It's.. complicated. Search the web for tutorials that explain 'positive lookahead' and 'negative lookahead'.
Thus:
Pattern p = Pattern.compile("(.)\\*(?!\\1|$)");
return !p.matcher(str).find();
All sorts of things going on here:
(?!X)
is negative lookahead.
\1|$
means: Either 'group 1' or 'end of string'. Given that our input contains X*
, the next thing after that star must either be an X or the end of the string - if it is anything else, we should return false.
- We don't want to match the entire string. We just want to ask: Is the 'invalid construct' anywhere in this string? - hence,
find()
, not matches()
.
To be clear, using regexp for this is probably a bad idea. Sure, the code will be extremely short, but it's not exactly readable, is it.
Without regexps, it becomes much easier to follow:
for (int i = 1; i < str.length() -1; i++) {
if (str.charAt(i) != '*') continue;
if (str.charAt(i - 1) != str.charAt(i + 1)) return false;
}
return true;
I'd strongly prefer the above over a regexp that doesn't readily show what it is actually accomplishing, and this regexp certainly doesn't make it remotely feasible to understand what it does just by looking at it.