Regular expression to match 12345

Question

Is there a regex to match a string of increasing contiguous numbers, e.g. 123, 56789, etc? I don't think it can be in regex but worth checking with folks here.

Zero would come before 1, right? – BoltClock Nov 18 '10 at 19:09 — BoltClock, Nov 18 '10 at 19:09

score 6 · Accepted Answer · answered Nov 18 '10 at 19:19

6

^(?:0(?=1|$))?(?:1(?=2|$))?(?:2(?=3|$))?(?:3(?=4|$))?(?:4(?=5|$))?(?:5(?=6|$))?(?:6(?=7|$))?(?:7(?=8|$))?(?:8(?=9|$))?(?:9$)?$

Result: http://rubular.com/r/JfJJ6ntEQG

answered Nov 18 '10 at 19:19

kennytm

510,854
105
1,084
1,005

It appears my upvote has caused your rep to round up to 80k :) – BoltClock Nov 18 '10 at 19:20
KennyTM: Pretty dupduplicatcatcative. Try recursion instead. And don’t get stuck on ASCII. – tchrist Nov 18 '10 at 21:24
@tchrist What is this "recursion" non-sense that I keep hearing *in context of regular expressions*? With the exception of the irregular "regular" expressions of Perl, I can not fit it into my head anywhere *near* a NFA. – Nov 18 '10 at 21:46
@pst: Wrong. All it takes is backrefs to disqualify a pattern language from the pathological prissy, stupidly useless, and wholly irregular definition of a regular language. Nobody cares about automata theory in modern patterns. Elaborate backreferences, conditionals, and yes [recursion](http://stackoverflow.com/questions/4031112/regular-expression-matching/4034386#4034386) are all part of the practical pattern landscape in today’s age. *NOBODY* gets stuck with impractical “irregularly regular” regexes. You can’t get the job done with such a stark and impoverished language. No thanks! – tchrist Nov 18 '10 at 21:50
1

@tchrist You like Perl. It has powerful regular expressions. Now, provided that this question (nor many you have replied to) are *tagged or imply Perl*, can you please stop making the (incorrect) assumption that these (perl features) are available? Acknowledging that some constructs are, well, Perl regex constructs and not criticizing every other approach would be nice (esp. when no "awesome Perl only" note is added). I am still am curious how to accept the language: "S -> S | SS, S -> (S), S -> ()" not using Perl/PCRE. – Nov 19 '10 at 00:07
@pst: Every time there is a regex question that doesn’t specify a language, then *of course* they will get a Perl answer from me: that’s who I am. What you do is your own business. I utterly reject your thesis that "regex" somehow specifies some sort of specific dialect. If they don’t say, that’s fine: they get a Perl answer. If you don’t like that, I challenge you to give a better answer than me. Pointing out the shortcomings of other subdialects’ partial regex implementations is doing language-agnostic and open-minded people a real service, and not telling them of these is a disservice. – tchrist Nov 19 '10 at 00:10
@tchrist my problem with your answers is that you always make it sound like these advanced features are "available everywhere anyone is using a modern language, library, or tool." By any sane definition of "modern," this simply isn't true. – Mike Clark Nov 19 '10 at 00:41
@Mike: Modern certainly means is able to deal with Unicode. The contrapositive is also true: Not able to deal with Unicode is anything but modern. In fact, it’s completely unacceptable. Anyone and everyone is perfectly welcome to provide a better answer than me — *if you can.* It’s a meritocracy, you know. Good answer prevail over lame ones. Good luck! – tchrist Nov 19 '10 at 00:47
@tchrist I'm not talking about Unicode, I'm talking about "elaborate backreferences, conditionals, and recursion." – Mike Clark Nov 19 '10 at 00:48
@Mike: Works in PCRE. Without Perl, you’d all be stuck with `egrep`. Joy! Everything anybody uses these days started out pretending they were Perl-compatible. Nothing is, but the rest of you are slowly catching up little by little. This is a good thing. – tchrist Nov 19 '10 at 00:49
@tchrist: Recursion can't change a `0` to a `1`. – kennytm Nov 19 '10 at 08:08
Can anyone explain the regex? – javaguy Dec 04 '10 at 21:17

John Kugelman · Answer 2 · 2010-11-18T19:32:15.300

^(1|^)((2|^)((3|^)((4|^)((5|^)((6|^)((7|^)((8|^)((9|^))?)?)?)?)?)?)?)?$

Python demonstration:

>>> import re
>>> good  = ['1', '12', '123', '23', '3', '4', '45', '456', '56', '6', '7', '78', '789', '89', '9', '123456789']
>>> bad   = ['a', '11', '13', '1345', '2459', '321', '641', '1 2', '222233334444']
>>> tests = good + bad
>>>
>>> regex = '^(1|^)((2|^)((3|^)((4|^)((5|^)((6|^)((7|^)((8|^)((9|^))?)?)?)?)?)?)?)?$'
>>> for test in tests:
...   print '%s: %s' % (re.match(regex, test) and 'Passed' or 'Failed', test)
... 
Passed: 1
Passed: 12
Passed: 123
Passed: 23
Passed: 3
Passed: 4
Passed: 45
Passed: 456
Passed: 56
Passed: 6
Passed: 7
Passed: 78
Passed: 789
Passed: 89
Passed: 9
Passed: 123456789
Failed: a
Failed: 11
Failed: 13
Failed: 1345
Failed: 2459
Failed: 321
Failed: 641
Failed: 1 2
Failed: 222233334444

Very nice repetetetition. Now make it *also* work for Bengali digits like ৭৮৯৮, Tamil digits like ௮௯, Thai digits like ༣༤༢༥༧༦༨, fullwidth digits like "０１１３２", and the mathematical sans-serif bold digits up there are U+1D7ED, where Python fears to tread. :) — tchrist, Nov 18 '10 at 21:21

tchrist · Answer 3 · 2010-11-19T00:32:38.850

Is there a regex to match a string of increasing contiguous numbers, e.g. 123, 56789, etc?

But of course there is, since the answer to all questions beginning, “Is there a (Perl) regex to match…?” is always “Why, certainly there is!” The operative question is always, “What is the Perl regex to match…?” ☺

Short Answer

That Perl regular expression is this one:

m{
  ^ (
      ( \d )
      (?(?= ( \d )) | $)
      (?(?{ ord $3 == 1 + ord $2 }) (?1) | $)
    ) $
}x

If works by having two different (?(COND)THEN|ELSE) conditional groups, with recursion on group 1 as the THEN clause of the second of those. That’s what (?1) does.

Nifty, eh?

Recursive patterns like these are awesomely cool and incredibly powerful; it’s up to you to use this power in the service of good, not evil. ☺

I use a lightly less clever form of it in the program given below. I’ll leave the other one there where it started just so you can see that in Perl There’s More Than One Way To Do It.

Full Demo Program

Notice that this works no matter what the string of Unicode digits, including non-ASCII (welcome to the Brave New Millennium) and even way up in the Astral Plane where languages stuck on UCS-2, or sometimes even UTF-16, cannot even think about.

This output:

Yes:       3456
 No:         43
Yes:        567
 No:       1245
 No:        568
 No:        987
Yes:         12
Yes:      12345
 No:         24
 No:      13456
 No:   12354678
Yes:   12345678
 No:       ١٣٠٢
Yes:       ٤٥٦٧
 No:        २१३
Yes:       ४५६७
Yes:         ८९
 No:        ১১২
Yes:       ৩৪৫৬
 No:       ৭৮৯৮
Yes:         ௮௯
 No:         ௮௮
 No:       ๖๗๗๘
Yes:        ๖๗๘
 No:    ༣༤༢༥༧༦༨
 No:      ０１１３２
Yes:        ２３４
Yes:         ８９
Yes:       
 No:       
 No:       
Yes:       
Yes:  
Yes:         
 No:      
Yes:

Is produced by this program:

#!/usr/bin/env perl   
use 5.10.0;
use utf8;
use strict;
use autodie;
use warnings  qw<  FATAL all     >;
use open      qw< :std  :utf8    >;
use charnames qw< :full          >;

# to iterate is human…
my @numbers = (
        3456,
          43,
         567,
        1245,
         568,
         987,
          12,
       12345,
          24,
       13456,
    12354678,
    12345678,
    hard_stuff(),
);   
my $ascending_rx = qr{
   ^ (  # works for *ANY* script!
        ( \p{Decimal_Number} ) 
        (?= $ | (??{ chr(1+ord($2)) }) )
        (?: (?1) | $ ) # …to recurse, divine!
    ) $
}x;

for my $n (@numbers) {
    printf "%s: %10s\n",
        ($n =~ $ascending_rx) ? "Yes" : " No",
        $n;
}

sub hard_stuff {   
    ( "\N{ARABIC-INDIC DIGIT ONE}"
    . "\N{ARABIC-INDIC DIGIT THREE}"
    . "\N{ARABIC-INDIC DIGIT ZERO}"
    . "\N{ARABIC-INDIC DIGIT TWO}"
    ),
    ( "\N{ARABIC-INDIC DIGIT FOUR}"
    . "\N{ARABIC-INDIC DIGIT FIVE}"
    . "\N{ARABIC-INDIC DIGIT SIX}"
    . "\N{ARABIC-INDIC DIGIT SEVEN}"
    ),
    ( "\N{DEVANAGARI DIGIT TWO}"
    . "\N{DEVANAGARI DIGIT ONE}"
    . "\N{DEVANAGARI DIGIT THREE}"
    ),
    ( "\N{DEVANAGARI DIGIT FOUR}"
    . "\N{DEVANAGARI DIGIT FIVE}"
    . "\N{DEVANAGARI DIGIT SIX}"
    . "\N{DEVANAGARI DIGIT SEVEN}"
    ),
    ( "\N{DEVANAGARI DIGIT EIGHT}"
    . "\N{DEVANAGARI DIGIT NINE}"
    ),
    ( "\N{BENGALI DIGIT ONE}"
    . "\N{BENGALI DIGIT ONE}"
    . "\N{BENGALI DIGIT TWO}"
    ),
    ( "\N{BENGALI DIGIT THREE}"
    . "\N{BENGALI DIGIT FOUR}"
    . "\N{BENGALI DIGIT FIVE}"
    . "\N{BENGALI DIGIT SIX}"
    ),
    ( "\N{BENGALI DIGIT SEVEN}"
    . "\N{BENGALI DIGIT EIGHT}"
    . "\N{BENGALI DIGIT NINE}"
    . "\N{BENGALI DIGIT EIGHT}"
    ),
    ( "\N{TAMIL DIGIT EIGHT}"
    . "\N{TAMIL DIGIT NINE}"
    ),
    ( "\N{TAMIL DIGIT EIGHT}"
    . "\N{TAMIL DIGIT EIGHT}"
    ),
    ( "\N{THAI DIGIT SIX}"
    . "\N{THAI DIGIT SEVEN}"
    . "\N{THAI DIGIT SEVEN}"
    . "\N{THAI DIGIT EIGHT}"
    ),
    ( "\N{THAI DIGIT SIX}"
    . "\N{THAI DIGIT SEVEN}"
    . "\N{THAI DIGIT EIGHT}"
    ),
    ( "\N{TIBETAN DIGIT THREE}"
    . "\N{TIBETAN DIGIT FOUR}"
    . "\N{TIBETAN DIGIT TWO}"
    . "\N{TIBETAN DIGIT FIVE}"
    . "\N{TIBETAN DIGIT SEVEN}"
    . "\N{TIBETAN DIGIT SIX}"
    . "\N{TIBETAN DIGIT EIGHT}"
    ),
    ( "\N{FULLWIDTH DIGIT ZERO}"
    . "\N{FULLWIDTH DIGIT ONE}"
    . "\N{FULLWIDTH DIGIT ONE}"
    . "\N{FULLWIDTH DIGIT THREE}"
    . "\N{FULLWIDTH DIGIT TWO}"
    ),
    ( "\N{FULLWIDTH DIGIT TWO}"
    . "\N{FULLWIDTH DIGIT THREE}"
    . "\N{FULLWIDTH DIGIT FOUR}"
    ),
    ( "\N{FULLWIDTH DIGIT EIGHT}"
    . "\N{FULLWIDTH DIGIT NINE}"
    ),
    #############################################
    #   Who's afraid of the astral planes?
    #   Try THIS, all you prisoners of UTF-16!
    #############################################
    ( "\N{MATHEMATICAL BOLD DIGIT TWO}"
    . "\N{MATHEMATICAL BOLD DIGIT THREE}"
    . "\N{MATHEMATICAL BOLD DIGIT FOUR}"
    . "\N{MATHEMATICAL BOLD DIGIT FIVE}"
    ),
    ( "\N{MATHEMATICAL BOLD DIGIT FIVE}"
    . "\N{MATHEMATICAL BOLD DIGIT FOUR}"
    . "\N{MATHEMATICAL BOLD DIGIT THREE}"
    . "\N{MATHEMATICAL BOLD DIGIT TWO}"
    ),
    ( "\N{MATHEMATICAL DOUBLE-STRUCK DIGIT ONE}"
    . "\N{MATHEMATICAL DOUBLE-STRUCK DIGIT TWO}"
    . "\N{MATHEMATICAL DOUBLE-STRUCK DIGIT TWO}"
    . "\N{MATHEMATICAL DOUBLE-STRUCK DIGIT THREE}"
    ),
    ( "\N{MATHEMATICAL DOUBLE-STRUCK DIGIT THREE}"
    . "\N{MATHEMATICAL DOUBLE-STRUCK DIGIT FOUR}"
    . "\N{MATHEMATICAL DOUBLE-STRUCK DIGIT FIVE}"
    . "\N{MATHEMATICAL DOUBLE-STRUCK DIGIT SIX}"
    ),
    ( "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT ONE}"
    . "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT TWO}"
    . "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT THREE}"
    . "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT FOUR}"
    . "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT FIVE}"
    . "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT SIX}"
    . "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT SEVEN}"
    . "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT EIGHT}"
    . "\N{MATHEMATICAL SANS-SERIF BOLD DIGIT NINE}"
    ),
    ( "\N{MATHEMATICAL MONOSPACE DIGIT ZERO}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT ONE}"
    ),
    ( "\N{MATHEMATICAL MONOSPACE DIGIT TWO}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT THREE}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT FOUR}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT SIX}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT FIVE}"
    ),
    ( "\N{MATHEMATICAL MONOSPACE DIGIT THREE}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT FOUR}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT FIVE}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT SIX}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT SEVEN}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT EIGHT}"
    . "\N{MATHEMATICAL MONOSPACE DIGIT NINE}"
    ),
}

PS: Some say that the reason that There’s More Than One Way To Do It in Perl is to make up for all those other languages in which there are no ways to do it — which is often most of them. ☻

This only works in Perl because of the `?{...}`. While your solution supports for any parts of Unicode, it is useless if OP doesn't use Perl. — kennytm, Nov 19 '10 at 08:11

score 2 · Answer 4 · edited Nov 18 '10 at 19:59

2

In Perl:

my $prev = 0;

while($string =~ s/^\s*(\d+)\s*,\s*(.*)/$2/is) {
    my $new = $1;

    if($new > $prev) { 
        #NUMBERS ARE GETTING LARGER
    } #if 
    else {
        #NUMBER GOT SMALLERS
    }
}

edited Nov 18 '10 at 19:59

Bart Kiers

166,582
36
299
288

answered Nov 18 '10 at 19:14

user387049

6,647
8
53
55

Perl can do better than that; **much** better, in fact. :) – tchrist Nov 18 '10 at 21:23

Regular expression to match 12345

4 Answers4

Short Answer

Full Demo Program

Linked