1

This task is formulated on the Codingbat. (https://codingbat.com/prob/p194491):

Returns true if for every '*' (star) in the string, if there are chars both immediately before and after the star, they are the same.

sameStarChar("xy*yzz") → true
sameStarChar("xy*zzz") → false
sameStarChar("*xa*az") → true

How can I fix my solution in order to solve this task using regular expressions?

My attempt:

public boolean sameStarChar(String str) {
  return str.matches(".*([*])*.*");
}

My solution is correct for more than half the tests, but not for all.

Evgeniy
  • 140
  • 1
  • 8
  • the question is unclear. What about the String `"a***b"`? Should the method return `true` or `false`? – Turing85 May 08 '22 at 13:15
  • 3
    I don't understand why `*xa*az` is true? There is an "x" after the first star, but no "x" before it, so shouldn't it return false? – Sweeper May 08 '22 at 13:16
  • 3
    *Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.* -- Jamie Zawinski – Elliott Frisch May 08 '22 at 13:19
  • 1
    @Sweeper - the wording says "if there are chars both immediately before and after the star". There is no mention of "if there are **not** chars before and after" but I suppose from the 3rd example we can guess that is equivalent to a match. – dangling else May 08 '22 at 13:20
  • Why was this closed? It's a silly thing to use regexp for this but that's not a reason to close. The details are fine - the question is clear, e.g. `*xa` is true because: "_IF_ there are chars both immediately before and after...". There aren't, so, that * is irrelevant and is no reason for the string to be invalid. Similarly, `***`? That's valid: There is only 1 star for which the 'both characters immediately before and after' clause holds, and the chars immediately before and after the middle star.. are identical, hence, valid. – rzwitserloot May 08 '22 at 13:27
  • @Turing85 `a***b` would of course be false. For every star in the string, we check: 1. Does it have characters before and after. (all 3 stars pass this test), and 2. If so, if those 2 characters are NOT identical, the string fails. Both the first and third star indicate `a***b` is therefore false. – rzwitserloot May 08 '22 at 13:28
  • Interesting challenge. Only I don’t understand which fun you would be having from us solving it for you. To me it would be like depriving you of both the fun and the learning. – Ole V.V. May 08 '22 at 14:29

2 Answers2

3

It can be done but it is quite complicated. Your regexp just scans for 'anything, then any number of stars, then anything - which matches everything. Your code returns true for literally every string imaginable.

Your approach is problematic. Trying to positively match is quite complicated. The question is asking the negative: "This construct is invalid; any string that doesn't have an invalid construct is valid" is what it boils down to, with the invalid construct being: "A*B", where A and B are not identical. After all, *a is valid (example 3). Presumably *** is also valid (the first and last star aren't a problem because they don't have characters on both sides, the middle one is fine because the character on each side is identical).

Thus, what you want is to write a regexp that finds the invalid construct, and return the inverse.

To find the invalid construct you need something called the backreference: You want to search for a thing and then refer to it.

".\*." - we start here: A character, a star, and a character. But now we need for the second character (the second . to actually be: Something OTHER than the first character). After all, ".\*." would also match on "A*A" and that is a valid construct so we don't want it to match.

Enter backrefs. () in regexpese makes a 'group' - a thing you can refer to later. \1 is a backref - it's "whatever matched for the first set of parentheses".

But we need more - we need to negate: Match if NOT this. Within chargroups there's ^ - [^fo] means: "Any character that is NOT an 'f' or an 'o'". But a backref isn't a chargroup.

As per this SO question backing me up on this, the only way is negative lookahead. Lookahead is a thing where you don't actually match characters, you merely check if they WOULD match, and if it would, fail the match. It's.. complicated. Search the web for tutorials that explain 'positive lookahead' and 'negative lookahead'.

Thus:

Pattern p = Pattern.compile("(.)\\*(?!\\1|$)");

return !p.matcher(str).find();

All sorts of things going on here:

  • (?!X) is negative lookahead.
  • \1|$ means: Either 'group 1' or 'end of string'. Given that our input contains X*, the next thing after that star must either be an X or the end of the string - if it is anything else, we should return false.
  • We don't want to match the entire string. We just want to ask: Is the 'invalid construct' anywhere in this string? - hence, find(), not matches().

To be clear, using regexp for this is probably a bad idea. Sure, the code will be extremely short, but it's not exactly readable, is it.

Without regexps, it becomes much easier to follow:

for (int i = 1; i < str.length() -1; i++) {
  if (str.charAt(i) != '*') continue;
  if (str.charAt(i - 1) != str.charAt(i + 1)) return false;
}
return true;

I'd strongly prefer the above over a regexp that doesn't readily show what it is actually accomplishing, and this regexp certainly doesn't make it remotely feasible to understand what it does just by looking at it.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
2

You can use

public boolean sameStarChar(String str) {
     return str.matches("^(?!.*(.)\\*(?!\\1|$)).*");
}

Details:

  • ^ - start of string
  • (?!.*(.)\*(?!\1|$)) - a negative lookahead that fails the match if there are
    • .* - any zero or more chars other than line break chars as many as possible
    • (.) - Group 1 (\1): any one char other than line break chars
    • \* - an asterisk
    • (?!\1|$) - not immediately followed with the same value as in Group 2 or end of string
  • .* - the rest of the string is consumed (as .matches requires a full string match).

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • That works for my test examples too. Impressive. (is *`^` - start of string* superfluous since, as you say, `matches()` requires the entire string?) – Ole V.V. May 08 '22 at 16:34
  • 1
    @OleV.V. In many cases where the match is anchored to the start of string by default, I still use `^`, just because it will be easier to debug later in online testers. You can omit it here, of course. – Wiktor Stribiżew May 08 '22 at 18:28