1

I intends to use REGEX in Java to match a String. For e.g.I have letters 'A' 'A' 'A' 'P' 'P' 'L' 'E' 'S'.

I want to check whether a string contains 0-3 'A', 0-2 'P', 0-1 'L', 0-1 'E' and 0-1 'S'.

Examples of string which shall pass the check: APPLE, APPLES, ALE, PALE... etc

This is what I've tried:

if (str.matches("[A]{0,3}[P]{0,3}[L]{0,1}[E]{0,1}[S]{0,1}"))

APPLES successfully passed through the check but not SAPPLE.

user3437460
  • 17,253
  • 15
  • 58
  • 106

3 Answers3

3
(?!(.*A){4,})(?!(.*P){3,})(?!(.*L){2,})(?!(.*E){2,})(?!(.*S){2,})^[APLES]*$

Try this.This will work for you.See demo.

http://regex101.com/r/sH6aF3/2

It uses a negative lookahead to ensure

A should not come 4 or more times.Similarly rest of the lookahead's ensure other conditions.In the end when its time to consume the string only selected characters inside [] are allowed.

vks
  • 67,027
  • 10
  • 91
  • 124
  • What the I don't even... but you're right, it is correct. The OP would probably be helped if you added an explanation of how it works. +1 for answering the question, but a regexp is probably still the wrong way to go about this. – Thomas Sep 20 '14 at 06:45
2

Optimised version of vks's regex:

^(?!(?>[^A]*A){4})(?!(?>[^P]*P){3})(?!(?>[^L]*L){2})(?!(?>[^E]*E){2})(?!(?>[^S]*S){2})[APLES]*$

Changes:

  • Moved ^ anchor to the front.
  • Changed capturing groups to atomic groups to prevent backtracking - Possible drastic performance improvement when the string fails to match.
  • Use [^A]*A et cetera in groups in place of .* backtracking to roll over the match efficiently.

Here is a regex demo.

Community
  • 1
  • 1
Unihedron
  • 10,902
  • 13
  • 62
  • 72
  • there's still a lot of learning left 4 me :) – vks Sep 20 '14 at 07:24
  • that is so cool :) thanx. will go through them – vks Sep 20 '14 at 07:29
  • 1
    How could I've overseen this riddle which looks so familiar :] Yours indeed seems to be more efficient when it comes to counting steps, whereas @vks' is a bit more compact and [mine](http://regex101.com/r/iR2hI2/1) would again be something in between regarding efficiency. Well didn't test that much. – Jonny 5 Sep 21 '14 at 05:30
1

You are imposing a specific order to your letters (A before P before L before E before Q, that's why APPLES passes and not SAPPLE). Actually, I'm not sure what you want to do is even possible with a reasonably short regex. I would to it manually with a single loop and a Map to count the occurrences.

Dici
  • 25,226
  • 7
  • 41
  • 82
  • Do you know how to remove the specific order in my codes? I can use the conventional way of using a loop, but I think it might not be very efficient since I have possibly over 50,000 words from a dictionary to check. – user3437460 Sep 20 '14 at 01:55
  • Unless your code is running on a washing machine, 50k words is peanuts. And how do you think regexes are matched, if not with a loop? – Thomas Sep 20 '14 at 06:45