1

Given an input string with an arbitrary number of 'x' characters (x, xx, xxxxx, xxxxxxxxxxxxx and so on), how can one write a regex that matches the input string only if it has a prime number of 'x' characters? A string of length 1 should not be matched.

For example:

Match these: xx xxx xxxxx xxxxxxx

But not these: x xxxx xxxxxxxxx

This is one solution I found - ^(?!(xx+)\1+$) (here, as an answer to this problem). However, I would like to know why it works. Please share any alternate solutions as well.

I'm using the PCRE engine.

I realize that one would typically not use regexes for this sort of thing. I am simply curious about how it could be done.

Inglorion
  • 171
  • 2
  • 2
  • 10

2 Answers2

8
^(?!(xx+)\1+$)

works by performing a negative lookahead at the start of the line. It will reject any line that consists of

  • a number of 2 or more xes
  • followed by the same number of xes, possibly multiple times

In other words, the regex works by matching any number of xes that can not be divided in smaller, equally sized groups of size >= 2.

To exclude the case of just a single x, you could use ^(?!(xx+)\1+$|x$).

Sam
  • 811
  • 5
  • 9
0

I don't think regex is the right tool for this. Why do you need to do this?

If you can't make any assumptions about the length of the strings you need to check if the number is a prime number somehow (which is computationally expensive).

If you know the max length you could precalculate the prime-numbers, and then check the length against them, but it would still be needlessly complex to do this using regex.

So the only way I know of to do this is to use \b(\d{2}|\d{3}|\d{5})\b which as you can tell will quickly become cumbersome.

Astrogat
  • 1,617
  • 12
  • 24
  • I realize that one would typically not use regexes for this sort of thing. I am simply curious about whether it's possible and how it could be done. – Inglorion May 07 '19 at 10:08