2

I have following RegEx: (([a-zA-Z0-9?]{4,8})(-[a-zA-Z0-9?]{4,8})+-([a-zA-Z0-9?]{4,8}))

How can I avoid matching sequences which do not contain at least one digit AND one character (a-zA-Z)?

For example:

This text: Hello World 123 abc 1AB2C-D3FGH-456I7-JK8LM-NOP9Q Hello World 123 abc should return 1AB2C-D3FGH-456I7-JK8LM-NOP9Q

and this: Hello World 123 abc 11111-1111-1111 Hello World 123 abc

or

Hello World 123 abc aaaa-aaaa-aaaa-aaa Hello World 123 abc

should return nothing.

I develop in Java and get the group like this:

public List<String> getKeys() {
    keys = new ArrayList<>();
    Matcher matcher = KEY_REGEX.matcher(text);
    while (matcher.find()) {
        keys.add(matcher.group());
    }
    return keys;
}

Thanks!

Chr3is
  • 387
  • 5
  • 21
  • 3
    FYI `[A-z]` matches more than just letters. Have a look at an [ASCII table](https://www.ascii-code.com/). – Toto Feb 23 '20 at 16:28
  • Should every part of the sequence contain a digit and a char A-Z? Or the whole sequence at least one time? Would `A1111-1111-1111` be valid? In the last case try `\b(?=[A-Z0-9-]*[A-Z])(?=[A-Z0-9-]*[0-9])[A-Z0-9]+(?:-[A-Z0-9]+)+\b` https://regex101.com/r/T8Cy4C/1 – The fourth bird Feb 23 '20 at 16:30
  • The whole sequence should contain a digit and a letter at least one time. So your example would/should be valid. @Toto thank's for the hint [a-zA-Z] should be used. – Chr3is Feb 23 '20 at 16:32

1 Answers1

3

One way is to use a positive lookahead (?= to check for at least an occurrence of A-Z and a digit 0-9

To match the occurrences in the - in the lookahead to find both, you could add it to the character class.

When matching, you start matching chars A-Z0-9 and repeat a group prepending the - so that there are no consecutive occurrences of - and not at the start or at the end.

\b(?=[A-Z0-9-]*[A-Z])(?=[A-Z0-9-]*[0-9])[A-Z0-9]+(?:-[A-Z0-9]+)+\b
  • \b Word boundary
  • (?=[A-Z0-9-]*[A-Z]) Assert a char A-Z
  • (?=[A-Z0-9-]*[0-9]) Assert a digit 0-9
  • [A-Z0-9]+ Match 1+ occurrences of A-Z0-9
  • (?:-[A-Z0-9]+)+ Repeat matching 1+ occurrences of A-Z0-9 with - prepended
  • \b Word boundary

Regex demo

Note that [A-z] matches more than [A-Za-z]


Limiting the character class to 4-8 occurrences:

\b(?=[A-Z0-9-]*[A-Z])(?=[A-Z0-9-]*[0-9])[A-Z0-9]{4,8}(?:-[A-Z0-9]{4,8})+\b
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • This is working perfectly! Did not know the boundary. Just two questions: Why do I need *[A-Z] (same for the digits) in the positiv lookup and why do I need the non-capturing group? – Chr3is Feb 23 '20 at 17:03
  • 1
    @Chr3is In the lookahead the `[A-Z0-9-]*` is used to be able to match all the specified characters in the character class until you get to a `[A-Z]` or `[0-9]`. The `*` means 0 or more times. The non capturing group is to repeat the part as a whole `(?:-[A-Z0-9]{4,8})+` – The fourth bird Feb 23 '20 at 17:10
  • Awesome I got it thanks :)! In my tests the RegEx works without the non capturing group too. – Chr3is Feb 23 '20 at 17:17
  • @Chr3is You are welcome. The grouping is used to not get matches like `A1111---1111---1111` – The fourth bird Feb 23 '20 at 17:22