1

I want a regex to select product references which fulfill these conditions:

  • 8-character long
  • starts with Q
  • contains a mix of capital letters and numbers
  • contains at least 1 number and 1 letter other than the initial Q
  • ends with a letter or a number

For instance:

  • QC1589ZH is valid ref
  • Q1234567 is not a valid ref
  • QUANTITY is not a valid ref

The regex will be used in a translation tool to select strings of text and block them. It will not be part of a code and thus cannot be tested or split. The software uses .NET regexes. I can use look-afters and look-behinds if it helps. The ref is always surrounded by spaces, line breaks, or at the begining or the end of a line.

Currently, I'm using the regex below. It works fine for valid refs but it also selects invalid refs like "Q1234567" and "QUANTITY".

\bQ[A-Z0-9]{7}\b

I have tried and modified several regexes suggested by others (notably here: Regex pattern to match at least 1 number and 1 character in a string) but they are all too greedy.

Emma
  • 27,428
  • 11
  • 44
  • 69
elisar
  • 15
  • 3
  • 1
    Thank you all for your help! Barmar's regex is just what I need here but I'm sure that I'll be using The fourth bird's method in other cases, too. – elisar Jul 04 '19 at 09:23

4 Answers4

1
\bQ(?=.*[A-Z])(?=.*\d)[A-Z\d]{7}\b
  • (?=.*[A-Z]) ensures that it contains at least one letter after the initial Q.
  • (?=.*\d) ensures that it contains at least one digit.
  • [A-Z\d]{7} requires that it contains exactly 7 letters or digits after the initial Q.
  • \b matches word boundaries.

https://regex101.com/r/zEgjYk/1

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thank you for your quick answer and for the detailed explanation! This works for me. I didn't realise I could use look-aheads like this. – elisar Jul 04 '19 at 09:07
  • Isn't that essentially what the answer in the question you linked to demonstrates? The main difference is that you put the initial `Q` before it. – Barmar Jul 05 '19 at 00:05
  • I guess I just didn't know how to use them correctly... Thank you! – elisar Jul 12 '19 at 14:49
1

Your current pattern \bQ[A-Z0-9]{7}\b does not take a mandatory uppercase char and digit into account because the character class matches any of the listed.

For your example data, you might use:

\bQ(?=[A-Z0-9]*[A-Z])(?=[A-Z0-9]*[0-9])[A-Z0-9]{7}\b
  • \bQ Word boundary and match Q
  • (?=[A-Z0-9]*[A-Z]) Assert an uppercase char
  • (?=[A-Z0-9]*[0-9]) Assert a digit
  • [A-Z0-9]{7} Match 7 times matching any of the character class
  • \b Word boundary

.NET regex demo

If there has to another uppercase char than Q following, you might subtract Q from the character class:

\bQ(?=[A-Z0-9-[Q]]*[A-Z-[Q]])(?=[A-Z0-9-[Q]][0-9])[A-Z0-9-[Q]]{7}\b

.NET Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • breaks on Q123456Q – jhnc Jul 03 '19 at 16:34
  • hmm. question is ambiguous. "letter other than initial Q" could mean "a letter that is not Q" – jhnc Jul 03 '19 at 16:36
  • @jhnc I initially misread it that way, but I'm pretty sure he means "after the initial Q". – Barmar Jul 03 '19 at 16:41
  • @jhnc I have added another pattern to exclude a following `Q` – The fourth bird Jul 03 '19 at 16:41
  • 1
    This one works well, thank you! I don't think I need the regex to be any more specific than what Barmar suggested insofar there aren't any words that mix letters and numbers (and could include other non-space characters) apart from those I want to select, but it's good to be safer, too. – elisar Jul 04 '19 at 09:13
  • @elisar It also has to do with crossing boundaries. In this example the accepted answer matches both. See [demo](http://regexstorm.net/tester?p=%5cbQ%28%3f%3d.*%5bA-Z%5d%29%28%3f%3d.*%5cd%29%5bA-Z%5cd%5d%7b7%7d%5cb&i=QC1589ZH%0d%0aQ1234567%0d%0aQUANTITY+QC1589ZH) – The fourth bird Jul 04 '19 at 09:38
0

My guess is that this expression for instance might fulfill our desired rules here:

\bQ(?=.*[0-9])[A-Z0-9]{7}\b 

Demo

Emma
  • 27,428
  • 11
  • 44
  • 69
0
(?i)^Q(?=.*[0-9])(?=.*[a-z-[q]]+[^Q]$)[a-z0-9]{7}$

(?i) Case-insensitive search

^Q String starts with Q

(?=.*[0-9]) Asserts string contains numbers

(?=.*[a-z-[q]]+[^Q]$) Asserts string contains letters except Qand doesn't end with Q.

[a-z0-9]{7} Remained 7 alphanumeric characters

$ End of the string

C# code:

var texts = new List<string>
{
    "QC1589ZH",
    "Q1234567",
    "Q12FQ457",
    "Q123F56Q",
    "QUANTITY"
};

foreach (string text in texts)
{
    var pattern = @"(?i)^Q(?=.*[0-9])(?=.*[a-z-[q]]+[^Q]$)[a-z0-9]{7}$";
    WriteLine($"Text: {text}, Is match: {Regex.IsMatch(text, pattern)}");
}

/*
    Output:
    Text: QC1589ZH, Is match: True
    Text: Q1234567, Is match: False
    Text: Q12FQ457, Is match: False
    Text: Q123F56Q, Is match: False
    Text: QUANTITY, Is match: False
*/
Community
  • 1
  • 1
JohnyL
  • 6,894
  • 3
  • 22
  • 41
  • Thank you for your input, but this doesn't exactly answer what I need: the ref is not alone on a line, and I never said it can't end with a Q or contain another Q. Could help someone else, though! – elisar Jul 04 '19 at 09:19