0

Let's say I have a string of 2 characters. Using regex (as a thought exercise), I want to accept it only if the first character has an ascii value bigger than that of the second character.

ae should not match because a is before e in the the ascii table.

ea, za and aA should match for the opposite reason

f$ should match because $ is before letters in the ascii table.

It doesn't matter if aa or a matches or not, I'm only interested in the base case. Any flavor of regex is allowed.

Can it be done ? What if we restrict the problem to lowercase letters only ? What if we restrict it to [abc] only ? What if we invert the condition (accept when the characters are ordered from smallest to biggest) ? What if I want it to work for N characters instead of 2 ?

LogicalKip
  • 514
  • 4
  • 13
  • 1
    What have you tried? What didn't work? What did you get? What did you expect? What doesn't work with your code and where is it? – Toto Nov 13 '19 at 15:20
  • I haven't tried anything because I don't have the slightest clue what could make this work :D. I have thought about a lot of options but I already know why each of them wouldn't work. – LogicalKip Nov 13 '19 at 15:28
  • 3
    Regex is not the right tool for such job. The only solution, I think, is to code **all** the alternatives: `zy|zx|zw|...|ca|ba` good luck ;) – Toto Nov 13 '19 at 15:31
  • That's what I thought, but I want to see if I've missed something, possibly some obscure modifier (?WHATEVER), or a recursive check that removes letters one at a time reusing a previous regex, or something wizardy like that... – LogicalKip Nov 13 '19 at 15:35
  • pretty sure that would be considered a META pattern you're trying to match, not a "_pattern_". Regular expressions are for pattern matching, and the flavors that go beyond that (like the ones with constructs to enable matching braces or whatever) are generally considered to be "irregular", however useful those features might be – Code Jockey Nov 13 '19 at 15:42
  • Also there is already a feature on the subject (`[a-z]`), so maybe there are more – LogicalKip Nov 13 '19 at 15:57

2 Answers2

2

I guess that'd be almost impossible for me to do it then, however bobble-bubble impressively solved the problem with:

^~*\}*\|*\{*z*y*x*w*v*u*t*s*r*q*p*o*n*m*l*k*j*i*h*g*f*e*d*c*b*a*`*_*\^*\]*\\*\[*Z*Y*X*W*V*U*T*S*R*Q*P*O*N*M*L*K*J*I*H*G*F*E*D*C*B*A*@*\?*\>*\=*\<*;*\:*9*8*7*6*5*4*3*2*1*0*\/*\.*\-*,*\+*\**\)*\(*'*&*%*\$*\#*"*\!*$(?!^)

bobble bubble RegEx Demo


Maybe for abc only or some short sequences we would approach solving the problem with some expression similar to,

^(abc|ab|ac|bc|a|b|c)$
^(?:abc|ab|ac|bc|a|b|c)$

that might help you to see how you would go about it.

RegEx Demo 1


You can simplify that to:

^(a?b?c?)$
^(?:a?b?c?)$

RegEx Demo 2

but I'm not so sure about it.


The number of chars you're trying to allow is irrelevant to the problem you are trying to solve:

because you can simply add an independent statement, if you will, for that, such as with:

(?!.{n})

where n-1 would be the number of chars allowed, which in this case would be

(?!.{3})^(?:a?b?c?)$
(?!.{3})^(a?b?c?)$

RegEx Demo 3

Emma
  • 27,428
  • 11
  • 44
  • 69
  • @toto are you referring to the fact that they're asking to "accept it only if the _first character_ has an ascii value bigger than that of the _first character_"? – Code Jockey Nov 13 '19 at 15:36
  • or the fact that the examples seem to show opposite - probably that --- sorry, my pedantry got ahead of my reading... – Code Jockey Nov 13 '19 at 15:38
  • @CodeJockey: And the string is 2 character long only. – Toto Nov 13 '19 at 15:39
  • 1
    Emma I Like your answer. [Here I did a little modification of your idea](https://regex101.com/r/zYT1r8/2) for longer words if @LogicalKip interested. Used [this php](https://tio.run/##K8go@P/fxj7AI4CLKy2/SEMl09bQyMxaQSXTztgIROnqanIpKKQmZ@QrFBSlpscXluaXpGokZ4CUauqo66tr6qlrqXPZ2/3/DwA) to generate the pattern. – bobble bubble Nov 13 '19 at 16:22
  • 1
    Thanks for your update Emma, but I was just playing with your idea! :) OP asked for 2 characters only which better matches your solution with `(?!.{n})`. Thank you! however for include my comment:) – bobble bubble Nov 13 '19 at 16:41
1

A regex is not the best tool for the job.
But it's doable. A naive approach is to enumerate all the printable ascii characters and their corresponding lower range:

\x21[ -\x20]|\x22[ -\x21]|\x23[ -\x22]|\x24[ -\x23]|\x25[ -\x24]|\x26[ -\x25]|\x27[ -\x26]|\x28[ -\x27]|\x29[ -\x28]|\x2a[ -\x29]|\x2b[ -\x2a]|\x2c[ -\x2b]|\x2d[ -\x2c]|\x2e[ -\x2d]|\x2f[ -\x2e]|\x30[ -\x2f]|\x31[ -\x30]|\x32[ -\x31]|\x33[ -\x32]|\x34[ -\x33]|\x35[ -\x34]|\x36[ -\x35]|\x37[ -\x36]|\x38[ -\x37]|\x39[ -\x38]|\x3a[ -\x39]|\x3b[ -\x3a]|\x3c[ -\x3b]|\x3d[ -\x3c]|\x3e[ -\x3d]|\x3f[ -\x3e]|\x40[ -\x3f]|\x41[ -\x40]|\x42[ -\x41]|\x43[ -\x42]|\x44[ -\x43]|\x45[ -\x44]|\x46[ -\x45]|\x47[ -\x46]|\x48[ -\x47]|\x49[ -\x48]|\x4a[ -\x49]|\x4b[ -\x4a]|\x4c[ -\x4b]|\x4d[ -\x4c]|\x4e[ -\x4d]|\x4f[ -\x4e]|\x50[ -\x4f]|\x51[ -\x50]|\x52[ -\x51]|\x53[ -\x52]|\x54[ -\x53]|\x55[ -\x54]|\x56[ -\x55]|\x57[ -\x56]|\x58[ -\x57]|\x59[ -\x58]|\x5a[ -\x59]|\x5b[ -\x5a]|\x5c[ -\x5b]|\x5d[ -\x5c]|\x5e[ -\x5d]|\x5f[ -\x5e]|\x60[ -\x5f]|\x61[ -\x60]|\x62[ -\x61]|\x63[ -\x62]|\x64[ -\x63]|\x65[ -\x64]|\x66[ -\x65]|\x67[ -\x66]|\x68[ -\x67]|\x69[ -\x68]|\x6a[ -\x69]|\x6b[ -\x6a]|\x6c[ -\x6b]|\x6d[ -\x6c]|\x6e[ -\x6d]|\x6f[ -\x6e]|\x70[ -\x6f]|\x71[ -\x70]|\x72[ -\x71]|\x73[ -\x72]|\x74[ -\x73]|\x75[ -\x74]|\x76[ -\x75]|\x77[ -\x76]|\x78[ -\x77]|\x79[ -\x78]|\x7a[ -\x79]|\x7b[ -\x7a]|\x7c[ -\x7b]|\x7d[ -\x7c]|\x7e[ -\x7d]|\x7f[ -\x7e]

Try it online!


A (better) alternative is to enumerate the ascii characters in reverse order and use the ^ and $ anchors to assert there is nothing else unmatched. This should work for any string length:

^\x7f?\x7e?\x7d?\x7c?\x7b?z?y?x?w?v?u?t?s?r?q?p?o?n?m?l?k?j?i?h?g?f?e?d?c?b?a?`?\x5f?\x5e?\x5d?\x5c?\x5b?Z?Y?X?W?V?U?T?S?R?Q?P?O?N?M?L?K?J?I?H?G?F?E?D?C?B?A?@?\x3f?\x3e?\x3d?\x3c?\x3b?\x3a?9?8?7?6?5?4?3?2?1?0?\x2f?\x2e?\x2d?\x2c?\x2b?\x2a?\x29?\x28?\x27?\x26?\x25?\x24?\x23?\x22?\x21?\x20?$

Try it online!

You may replace ? with * if you want to allow duplicate characters.


ps: some people can come up with absurdly long regexes when they aren't the right tool for the job: to parse email, html or the present question.

Cœur
  • 37,241
  • 25
  • 195
  • 267