3

Hey there !

I am always looking for reflexion subjects about regex. I would like a regex that matches every words that contain 2^n characters in a list of words (where n is a natural number).

To make it simple, let's say a word is just a sequence of o
Let's also say the list is composed by words followed by the number of characters they contain and separated by spaces
Of course you can't use these numbers, it is for reading purpose !

For exemple in the list :
o (1) ooo (3) oooooo (6) oooo (4) ooooooooo (9) oo (2) oooooooooooo (12) oooooooo (8)

We should have the following matches :

matches : 'o', 'oo', 'oooo', 'oooooooo'


Your regex must however respect some rules :

  • You cannot use recursion
  • You cannot use any feature specific to a language (or a few languages)


If you manage to find one (or a trick) that works in javascript, it would be awesome (I don't think this is possible, though) !
Of course, it doesn't need to work with javascript.
Solving the problem is not the point here, I am only interested in how to solve it !

Edit :

Sadly, nobody found anything I was looking for. The question is still opened to answers, there must be good ones !

By the way, here is what I came up with, even if there should be better than that :

\b(?:o|(?:(?(1)\1|o)(?=((?(1)\1\1|o))))+\1)\b

Demo here

Gawil
  • 1,171
  • 6
  • 13
  • Just to clarify: *"You cannot use any feature specific to a language (or a few languages)"*, but *"a trick that works in javascript"* is ok? – Imanuel May 09 '17 at 17:14
  • Yeah, it's just that javascript's regex is a bit poor, so I don't think it is possible ^^ – Gawil May 09 '17 at 17:19
  • However, that's a specific case, if someone finds a javascript regex without any trick, it would be even more awesome ! ;) Edit : maybe the end is a bit unclear, I will clarify this ^^ – Gawil May 09 '17 at 17:20
  • https://regex101.com/r/tzAg8D/1 though this obviously is a limited scope ;) - so NAA. – Sebastian Proske May 09 '17 at 20:12
  • Well... That is... violent ! xD Even if it won't work for any n, you might have some ^^ – Gawil May 09 '17 at 21:51
  • 1
    I'm voting to close this question as off-topic because it belongs to https://codegolf.stackexchange.com/ – Toto May 10 '17 at 10:33
  • Oh nice, tg=hat was exactly what I was looking for, thanks ! I'll try over there ) – Gawil May 11 '17 at 14:06
  • I revise my comment : This is exactly the kind of site I was looking for to learn regex, but this is absolutely not the place to post this current question, as it is clearly not a puzzle nor a codegolf challenge, but a real regex question. – Gawil May 12 '17 at 14:38

3 Answers3

2

I know, you said no recursion, but just for the record:

\b(?:o|(o(?1)?o))\b

Test it on regex101.com

Let's break it down (so I can finally understand why it works as intended)! Ignore whitespace.

\b (?: o | ( o (?1)? o ) ) \b
\b                         \b # Word boundaries. Boring.
   (?: o |               )    # Just so it matches a single o, too.
           ( o (?1)? o )      # Now that's the interesting part.
           (           )      # Capture group 1
             o       o        # Matches an o each at the start and the end of the group
                              # -> the pattern matches from the outside to the inside.
               (?1)?          # Again the pattern of group 1, or nothing.
                              # -> Again one 'o' at the start and one at the end. Or nothing.

To be honest, I don't know why it does not match oooooo (6) with three two recursions.

Edit: I asked a new question about it

Community
  • 1
  • 1
Imanuel
  • 3,596
  • 4
  • 24
  • 46
  • Nice one ! But it does make it too easy, that's why I do not want recursion ^^ – Gawil May 09 '17 at 17:52
  • I'm still trying to figure out how exactly that works... :) – Imanuel May 09 '17 at 17:56
  • You mean, the recursive one ? – Gawil May 09 '17 at 17:59
  • Yes. I have updated the answer. I don't know why it doesn't match 6 characters. – Imanuel May 09 '17 at 18:11
  • Here is how I see it (ignore the "[]", it's just to show what has been inserted) : 1st "loop" -> `[o(?1)?o]` is captured, 2nd "loop" `o[o(?1)?o]o` is captured, 3rd "loop" -> `oo[oo(?1)?oo]oo` – Gawil May 09 '17 at 18:16
  • Each time you replace (?1) by the content of the precedent loop (yeah I know there is no loop, but I don't know how to call it ^^) – Gawil May 09 '17 at 18:17
  • Must be so, apparently. But: 1. I read [this](http://www.regular-expressions.info/subroutine.html) as "*pattern* is repeated", not *contents*. 2: The contents of group 1 is not yet completed when recursion takes place. But the result is clear. – Imanuel May 09 '17 at 18:22
  • It's the expression inside the capturing group that is repeated. And as the capturing group updates itself at each loop, the expression repeated does as well ! – Gawil May 09 '17 at 18:26
2

This regex should work on most regex engines that support backreferences for capture groups 1 to 9. But it can only capture up to 2^11=2048 o's

\bo{1,2}\b|\b(((((((((o{4})\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?\b

Test here

Or... we can just hardcode the 2^n numbers ;)

\b(?:o|oo|o{4}|o{8}|o{16}|o{32}|o{64}|o{128}|o{256}|o{512}|o{1024}|o{2048})\b
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • I think you misunderstood the question. What you get here are `o` and sequences of 2n `o`, which is not 2^n `o` ;) – Gawil May 09 '17 at 22:27
  • Haha no problem ! Your regex almost works indeed, but I want one that works for any power of 2 ! – Gawil May 10 '17 at 08:41
  • Yes, but without recursion it does seem impossible to create a generic regex that would work on almost any regex engine. But you probably already realised that. – LukStorms May 10 '17 at 08:51
  • I know it is possible, because I did it ;) But I'm using conditionals, so it's not generic at all... I don't post it yet, because I'm waiting for some smart mind to find an awesome solution I couldn't even think about ^^ – Gawil May 10 '17 at 08:55
0

There's a proof that you can't do that with regular languages or even a context free grammar here: https://cs.stackexchange.com/questions/32338/show-that-0i-where-i-is-a-power-of-2-is-not-context-free

So I believe that a regex without back references or any complex extensions is impossible to create.

Regular expressions with backreferences are NP-complete according to this http://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html, so back references should be enough to make the regex work.

Alistra
  • 5,177
  • 2
  • 30
  • 42
  • Of course that is impossible with a *regular* regular expression. Anything that needs to use any kind of memory can obviously not be regular. However, when I talk about regular expression, I am using the abuse of language, like almost *everyone* does today. – Gawil May 12 '17 at 15:11
  • The second article is very interesting though, thanks for sharing it ! – Gawil May 12 '17 at 15:15
  • You said "You cannot use recursion". If you meant back references, than the task is impossible – Alistra May 12 '17 at 15:24
  • By *recursion* I meant recursive regex (like ones containing (?R), or (?1) for instance) – Gawil May 12 '17 at 15:25