Try this:
^\d{3}P[A-Z]\d{7}[0-9X]$
The character group [0-9X]
will match a single numeric character or X
(unless an explicit quantifier other than {1}
– e.g. {2}
– follows it).
Addendum:
As @sln pointed out, it would be best to settle on 0-9
or \d
(not mix the two) in a given regexp for consistency – in other words use...
^\d{3}P[A-Z]\d{7}[\dX]$
...or...
^[0-9]{3}P[A-Z]\d{7}[0-9X]$
...in this case.
Performance
Following comments regarding abysmal regexp performance, the concerns are greatly overstated.
Here is a quick sanity check...
void Main()
{
// Quick sanity check.
string str = "111PH1234567X";
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 1000000; i++)
{
if (str.Substring(0, 3).All(char.IsDigit) //first 3 are digits
&& str[3] == 'P' //4th is P
&& char.IsLetter(str[4]) //5th is a letter
&& str.Substring(5, 7).All(char.IsDigit) //6-12 are digits
&& char.IsDigit(str[12]) || str[12] == 'X') //13 is a digit or X
{
;
//Console.WriteLine("good");
}
}
Console.WriteLine(stopwatch.Elapsed);
stopwatch = Stopwatch.StartNew();
Regex regex = new Regex(@"^\d{3}P[A-Z]\d{7}[0-9X]$", RegexOptions.Compiled);
for (int j = 0; j < 1000000; j++)
{
regex.IsMatch(str);
}
Console.WriteLine(stopwatch.Elapsed + " (regexp)");
// A bit more rigorous sanity check.
string[] strs = { "111PH1234567X", "grokfoobarbaz", "really, really, really, really long string that does not match", "345BA7654321Z" };
Stopwatch stopwatch2 = Stopwatch.StartNew();
for (int i = 0; i < strs.Length; i++)
{
for (int j = 0; j < 1000000; j++)
{
if (strs[i].Substring(0, 3).All(char.IsDigit) //first 3 are digits
&& strs[i][3] == 'P' //4th is P
&& char.IsLetter(strs[i][4]) //5th is a letter
&& strs[i].Substring(5, 7).All(char.IsDigit) //6-12 are digits
&& char.IsDigit(strs[i][12]) || strs[i][12] == 'X') //13 is a digit or X
{
;
//Console.WriteLine("good");
}
}
}
Console.WriteLine(stopwatch2.Elapsed);
stopwatch2 = Stopwatch.StartNew();
Regex regex2 = new Regex(@"^\d{3}P[A-Z]\d{7}[0-9X]$", RegexOptions.Compiled);
for (int i = 0; i < strs.Length; i++)
{
for (int j = 0; j < 1000000; j++)
{
regex2.IsMatch(strs[i]);
}
}
Console.WriteLine(stopwatch2.Elapsed + " (regexp)");
}
...that yields the following on my humble machine:
00:00:00.2134404
00:00:00.4527271 (regexp)
00:00:00.4872452
00:00:00.9534147 (regexp)
The regexp approach appears to be ~2x slower. As with anything, one needs to consider what makes sense for their use case, scale etc. Personally, I side with Donald Knuth, start with "premature optimization is the root of all evil", and would make a performance-driven choice only as needed.