3

I have a bunch of characters like this: A B B C D

And I have a few spaces like this: _ _ _

Is there a way to use regular expression to match any string that can be formed by "dragging" the available characters into the empty spaces?

So in the example, these are some valid matches:

A B C
A B B
B C B
D A B

But these are invalid:

A A B    // Only one 'A' is available in the set
B B B    // Only two 'B's are available in the set

Sorry if it has already been asked before.

CluelessNoob
  • 688
  • 1
  • 9
  • 20
  • 5
    I don't think regex is a good tool for that. You should probably investigate `Collections`, `Set`s in particular. I think your question is too broad though. I suggest you try your hand at some code and reformulate if a concrete issue occurs. – Mena Sep 05 '14 at 10:30
  • Well I already have a code-only solution. But I am just curious to know if this can be done with RegExp, which would give me some more flexibility. But of course, I understand if RegExp cannot do that. – CluelessNoob Sep 05 '14 at 10:33

3 Answers3

8

vks's solution would work properly, and here's it optimised with additions to fulfill the "_ _ _" rule:

^(?!(?:[^A]*A){2})(?!(?:[^B]*B){3})(?!(?:[^C]*C){2})(?!(?:[^D]*D){2})(?:[ABCD](?:\s|$)){3}

Here is a regex demo.

Changes from original regex:

  • Capturing groups are removed since we're in Java - Java regex implementation dedicates time to write captured groups during matching).
  • The anchor ^ is moved in front for readability of the regex.

Regex explanation:

  • ^ Asserts position at the start of the match.
  • (?! Negative lookahead - Asserts that our position does not match the following, without moving the pointer:
  •     (?:[^A]*A){2} Two "A"s (literal character), with non-"A"s rolled over in an optimal way.
  • ) Closes the group.
  • (?!(?:[^B]*B){3}) Same as the above group - Asserts that there are not three "B"s in the match.
  • (?!(?:[^C]*C){2}) Asserts that there are not two "C"s in the match.
  • (?!(?:[^D]*D){2}) Asserts that there are not two "D"s in the match.
  • (?: Non-capturing group: Matches the following without capturing.
  •     [ABCD] Any character from the list "A", "B", "C", or "D".
  •     (?:\s|$) A whitespace, or the end of string.
  • ){3} Three times - Must match the sequence exactly three times to fulfill the "_ _ _" rule.

To use the regex:

boolean fulfillsRule(String str) {
    Pattern tripleRule = Pattern.compile("^(?!(?:[^A]*A){2})(?!(?:[^B]*B){3})(?!(?:[^C]*C){2})(?!(?:[^D]*D){2})(?:[ABCD](?:\s|$)){3}");
    return tripleRule.matcher(str).find();
}
Community
  • 1
  • 1
Unihedron
  • 10,902
  • 13
  • 62
  • 72
  • Can end with a space. Had similar, mine could start with a space (edited). But not sure, if that's a requirement. – Jonny 5 Sep 06 '14 at 09:12
  • 1
    @Jonny5 That is very true! You can transform `(?:[ABCD](?:\s|$)){3}` into `[ABCD] [ABCD] [ABCD]`, but that's ugly. – Unihedron Sep 06 '14 at 09:14
5
 (?!(.*?A){2,})(?!(.*?B){3,})(?!((.*?C){2,}))(?!((.*?D){2,}))^[ABCD]*$

You can use something like this.See demo.

http://regex101.com/r/uH3fV3/1

vks
  • 67,027
  • 10
  • 91
  • 124
  • +1 for idea. Minor note: this regex will also accept something like `ABCfoofoofoofoofoo` since `.*` is accepting all characters, so this may cause some bugs. Instead of `.*` you can use `[ABBCD]*` which can also be optimized to `[ABCD]*` (there is no need to repeat characters here). – Pshemo Sep 05 '14 at 11:02
  • @Pshemo thanx a lot for the edit.That makes this more secure :) – vks Sep 05 '14 at 11:03
  • If you want to improve your answer you can add explanation of how this regex works. Add what is `^[ABCD]*$` or `(?!(.*?B){3,})` and what purpose does it have here. – Pshemo Sep 05 '14 at 11:11
  • @Pshemo If OP asks will provide.a lot can be learnt by try to decode regex :) – vks Sep 05 '14 at 11:13
  • Remember that main purpose of Stack Overflow is to create compendium of programming knowledge so while creating answer you shouldn't be focusing only on OP but on every future reader (and not many of regex beginners knows about [look-around mechanisms](http://www.regular-expressions.info/lookaround.html)). You don't have to describe each part of your regex, but giving information about what is the purpose/idea of `(?!(.*?B){3,})` and why you placed each of them before `^` would be very helpful to understand your answer. – Pshemo Sep 05 '14 at 11:20
  • Thanks a lot! But yeah, an explanation would be great as I am not too experienced in RegExp (and so may be many future readers). – CluelessNoob Sep 05 '14 at 13:00
2

Interesting problem, this is my idea:

(?m)^(?!.*([ACD]).*\1)(?!(?>.*?B){3})(?>[A-D] ){2}[A-D]$

Used (?m) MULTILINE modifier where ^ matches line start and $ line end.

Test at regexplanet (click on Java); regex101 (non Java)


If I understood it right, the available character-pot is A,B,B,C,D. A string should be valid, if it contains 0 or 1 of each [ACD] or 0-2 of B in your example. My pattern consists of 3 parts:

  • (?!.*([ACD]).*\1) Used at line-start ^ a negative lookahead to assure, that [ACD] occurs at most one time, by capturing [ACD] to \1 and checking, it does not occur twice anywhere.

  • (?!(?>.*?B){3}) Using a negative lookahead, to assure, B occurs at most 2x.

  • finally (?>[A-D] ){2}[A-D]$ determines the total usable character set, assures the formatting, where each letter must be prededed by space or start and checks the length.

This can be easily modified to other needs. Also see SO Regex FAQ

Community
  • 1
  • 1
Jonny 5
  • 12,171
  • 2
  • 25
  • 42
  • 1
    Great approach! This solves the problem in a different mindset from mine and vks'. While it asserts a completion and fails faster than succeeding, that's a minor downside to the matching. – Unihedron Sep 05 '14 at 13:44
  • +1 for linking to the regex reference. I would like to point out that `(?> )` groups can be modified to `(?: )` as well. – Unihedron Sep 05 '14 at 13:45
  • Thanks @Unihedron. Also I like your and @vks approach and enjoy those "regex-riddles" .) The `B` part in my pattern could possibly be done with a negative lookahead in a more compact way, similar to first part. – Jonny 5 Sep 05 '14 at 13:48
  • Also changed the `B` part to a more compact negative `(?!(?>.*?B){3})` – Jonny 5 Sep 05 '14 at 14:26
  • Propose change `.*?B` to `[^B]*B` for efficient roll-over. – Unihedron Sep 05 '14 at 14:27
  • @Unihedron `.*?` should stay in line (without `s` modifier) or `[^B\n]*`. If it's a single-line input `[^B]*B` is fine! – Jonny 5 Sep 05 '14 at 14:43