22

How can I match irreducible fractions with regex?

For example, 23/25, 3/4, 5/2, 100/101, etc.

First of all, I have no idea about the gcd-algorithm realization in regex.

Update for all of you who's answering like "You are using the wrong tool":

Yeah, guys, I'm realizing what regex is normally used for. It's okay. But that this question is weird is kind of its whole point.

Updated 2: The idea is to find a regex that could be helpful in a situation like:

$> echo "1/2" | grep -P regex
1/2
$> echo "2/4" | grep -P regex

So, the regex should be only a string, without using any scripts and variables. Only regex.

Actually, I already know some regex which match reducible fractions written in the unary number system.

$> echo "11/1111" | grep -P '^1/1+$|(11+)+\1+/\1+$'
11/1111

So the thing is to convert from decimal to unary number system in regex, but I don't know how.

  • 16
    You are using the _wrong_ tool for the job. – SLaks Apr 15 '11 at 14:45
  • 1
    You can't do it in one shot(using regex to identify data based on an algorithm), best way is to match any fraction extract those and then programatically verify if the matched fraction is irreducible or no based on the gcd algorithm. – Chandu Apr 15 '11 at 14:45
  • [What SLaks said. Period.](http://stackoverflow.com/questions/4098086/to-use-or-not-to-use-regular-expressions/4098123#4098123) –  Apr 15 '11 at 14:46
  • 1
    Wow, you mean regex didn't evolve into a complete programming language like HTML did? – phkahler Apr 15 '11 at 15:12
  • 2
    The questioner probably has in mind something of the flavor with matching prime numbers using a regex: http://montreal.pm.org/tech/neil_kandalgaonkar.shtml, http://stackoverflow.com/questions/20448/what-is-the-most-brilliant-regex-youve-ever-used – sawa Apr 15 '11 at 16:04
  • 3
    This might be a bad idea for real-world programming, but I love it as a programming puzzle. – Justin Morgan - On strike Apr 15 '11 at 16:51

4 Answers4

32

UPDATE

Since the poster requested a single regex that matches against strings like "36/270", but says it doesn’t matter how legible it is, that regex is:

my $reducible_rx = qr{^(\d+)/(\d+)$(?(?{(1x$1."/".1x$2)=~m{^(?|1+/(1)|(11+)\1*/\1+)$}})|^)};

But, if like me, you believe that an illegible regex is absolutely unacceptable, you will write that more legibly as:

my $reducible_rx = qr{
  # first match a fraction:
    ^ ( \d+ ) / ( \d+ ) $
  # now for the hard part:
    (?(?{ ( 1 x $1 . "/" . 1 x $2 ) =~ m{
                ^
                (?|    1+      / (1)  # trivial case: GCD=1
                  |  (11+) \1* / \1+  # find the GCD
                )
                 $
            }x
        })
          # more portable version of (*PASS)
     | ^  # more portable version of (*FAIL)
     )
}x;

You can improve maintainability by splitting out the version that matches the unary version from the one that matches the decimal version like this:

# this one assumes unary notation
my $unary_rx = qr{
    ^ 
    (?|   1+       / (1)
      | (11+)  \1* / \1+ 
    ) 
    $
}x;

# this one assumes decimal notation and converts internally
my $decimal_rx = qr{
  # first match a fraction:
    ^ ( \d+ ) / ( \d+ ) $ 
  # now for the hard part:
    (?(?{( 1 x $1 . "/" . 1 x $2 ) =~ $unary_rx})
          # more portable version of (*PASS)
     | ^  # more portable version of (*FAIL) 
     )
}x;

Isn’t that much easier by separating it into two named regexes? That would now make $reducible_rx the same as $decimal_rx, but the unary version is its own thing. That’s how I would do it, but the original poster wanted a single regex, so you’d have to interpolate the nested one for that as I first present above.

Either way, you can plug into the test harness below using:

    if ($frac =~ $reducible_rx) {
        cmp_ok($frac, "ne", reduce($i, $j), "$i/$j is $test");
    } else {
        cmp_ok($frac, "eq", reduce($i, $j), "$i/$j is $test");
    }

And you will see that it is a correct regex that passes all tests, and does so moreover using a single regex, wherefore having now passed all requirements of the original question, I declare Qᴜᴏᴅ ᴇʀᴀᴛ ᴅᴇᴍᴏɴsᴛʀᴀɴᴅᴜᴍ: “Quit, enough done.”

And you’re welcome.


The answer is to match the regex ^(?|1+/(1)|(11+)\1*/\1+)$ against the fraction once it has been converted from decimal to unary notation, at which point the greatest common factor will be found in $1 on a match; otherwise they are coprimes. If you are using Perl 5.14 or better, you can even do this in one step:

use 5.014;
my $reg  = qr{^(?|1+/(1)|(11+)\1*/\1+)$};
my $frac = "36/270";  # for example
if ($frac =~ s/(\d+)/1 x $1/reg =~ /$reg/) { 
    say "$frac can be reduced by ", length $1;
} else {
    say "$frac is irreducible";
}

Which will correctly report that:

36/270 can be reduced by 18

(And of course, reducing by 1 means there is no longer a denominator.)

If you wanted to have a bit of punning fun with your readers, you could even do it this way:

use 5.014;
my $regex = qr{^(?|1+/(1)|(11+)\1*/\1+)$};
my $frac  = "36/270";  # for example
if ($frac =~ s/(\d+)/"1 x $1"/regex =~ /$regex/) {
    say "$frac can be reduced by ", length $1;
} else {
    say "$frac is irreducible";
}

Here is the code that demonstrates how to do this. Furthermore, it constructs a test suite that tests its algorithm using all (positive) numerators and denominators up to its argument, or 30 by default. To run it under a test harness, put it in a file named coprimes and do this:

$ perl -MTest::Harness -e 'runtests("coprimes")'
coprimes .. ok       
All tests successful.
Files=1, Tests=900,  1 wallclock secs ( 0.13 usr  0.02 sys +  0.33 cusr  0.02 csys =  0.50 CPU)
Result: PASS

Here is an example of its output when run without the test harness:

$ perl coprimes 10
1..100
ok 1 - 1/1 is 1
ok 2 - 1/2 is 1/2
ok 3 - 1/3 is 1/3
ok 4 - 1/4 is 1/4
ok 5 - 1/5 is 1/5
ok 6 - 1/6 is 1/6
ok 7 - 1/7 is 1/7
ok 8 - 1/8 is 1/8
ok 9 - 1/9 is 1/9
ok 10 - 1/10 is 1/10
ok 11 - 2/1 is 2
ok 12 - 2/2 is 1
ok 13 - 2/3 is 2/3
ok 14 - 2/4 is 1/2
ok 15 - 2/5 is 2/5
ok 16 - 2/6 is 1/3
ok 17 - 2/7 is 2/7
ok 18 - 2/8 is 1/4
ok 19 - 2/9 is 2/9
ok 20 - 2/10 is 1/5
ok 21 - 3/1 is 3
ok 22 - 3/2 is 3/2
ok 23 - 3/3 is 1
ok 24 - 3/4 is 3/4
ok 25 - 3/5 is 3/5
ok 26 - 3/6 is 1/2
ok 27 - 3/7 is 3/7
ok 28 - 3/8 is 3/8
ok 29 - 3/9 is 1/3
ok 30 - 3/10 is 3/10
ok 31 - 4/1 is 4
ok 32 - 4/2 is 2
ok 33 - 4/3 is 4/3
ok 34 - 4/4 is 1
ok 35 - 4/5 is 4/5
ok 36 - 4/6 is 2/3
ok 37 - 4/7 is 4/7
ok 38 - 4/8 is 1/2
ok 39 - 4/9 is 4/9
ok 40 - 4/10 is 2/5
ok 41 - 5/1 is 5
ok 42 - 5/2 is 5/2
ok 43 - 5/3 is 5/3
ok 44 - 5/4 is 5/4
ok 45 - 5/5 is 1
ok 46 - 5/6 is 5/6
ok 47 - 5/7 is 5/7
ok 48 - 5/8 is 5/8
ok 49 - 5/9 is 5/9
ok 50 - 5/10 is 1/2
ok 51 - 6/1 is 6
ok 52 - 6/2 is 3
ok 53 - 6/3 is 2
ok 54 - 6/4 is 3/2
ok 55 - 6/5 is 6/5
ok 56 - 6/6 is 1
ok 57 - 6/7 is 6/7
ok 58 - 6/8 is 3/4
ok 59 - 6/9 is 2/3
ok 60 - 6/10 is 3/5
ok 61 - 7/1 is 7
ok 62 - 7/2 is 7/2
ok 63 - 7/3 is 7/3
ok 64 - 7/4 is 7/4
ok 65 - 7/5 is 7/5
ok 66 - 7/6 is 7/6
ok 67 - 7/7 is 1
ok 68 - 7/8 is 7/8
ok 69 - 7/9 is 7/9
ok 70 - 7/10 is 7/10
ok 71 - 8/1 is 8
ok 72 - 8/2 is 4
ok 73 - 8/3 is 8/3
ok 74 - 8/4 is 2
ok 75 - 8/5 is 8/5
ok 76 - 8/6 is 4/3
ok 77 - 8/7 is 8/7
ok 78 - 8/8 is 1
ok 79 - 8/9 is 8/9
ok 80 - 8/10 is 4/5
ok 81 - 9/1 is 9
ok 82 - 9/2 is 9/2
ok 83 - 9/3 is 3
ok 84 - 9/4 is 9/4
ok 85 - 9/5 is 9/5
ok 86 - 9/6 is 3/2
ok 87 - 9/7 is 9/7
ok 88 - 9/8 is 9/8
ok 89 - 9/9 is 1
ok 90 - 9/10 is 9/10
ok 91 - 10/1 is 10
ok 92 - 10/2 is 5
ok 93 - 10/3 is 10/3
ok 94 - 10/4 is 5/2
ok 95 - 10/5 is 2
ok 96 - 10/6 is 5/3
ok 97 - 10/7 is 10/7
ok 98 - 10/8 is 5/4
ok 99 - 10/9 is 10/9
ok 100 - 10/10 is 1

And here is the program:

#!/usr/bin/env perl
#
# coprimes - test suite to use unary coprimality algorithm
# 
# Tom Christiansen <tchrist@perl.com>
# Sun Apr 17 12:18:19 MDT 2011

use strict;
use warnings;

my $DEFAULT = 2*3*5;
my $max = @ARGV ? shift : $DEFAULT;

use Test::More;
plan tests => $max ** 2;

my $rx = qr{
    ^
    (?|   1+       / (1)
      | (11+)  \1* / \1+
    )
    $
}x;

for my $i ( 1 .. $max ) {
    for my $j ( 1 .. $max ) {
        my $test;
        if (((1 x $i) . "/" . (1 x $j)) =~ /$rx/) {
            my $cf = length($1);
            $test = $i / $cf;
            $test .= "/" . $j/$cf unless $j/$cf == 1;
        } else {
            $test = "$i/$j";
        }
        cmp_ok($test, "eq", reduce($i, $j), "$i/$j is $test");
    }
}

sub reduce {
    my ($a, $b) = @_;
    use Math::BigRat;
    my $f = new Math::BigRat "$a/$b";
    return "$f";
}
tchrist
  • 78,834
  • 30
  • 123
  • 180
  • 2
    +1, but you're doing part of the work outside the regex (conversion to unary). – SLaks Apr 17 '11 at 18:27
  • @SLaks: Better now? :) You know I certainly **could** build the `x`-factor conversion to unary inside the pattern with a dynamic `(??{⋯})` code escape (which is kinda like a `s///ee`), but that just makes it harder to read. – tchrist Apr 17 '11 at 18:40
  • 1
    But the main point of the question is to do that stuff _inside_ the regex, even it will be totally unreadable – ДМИТРИЙ МАЛИКОВ Apr 17 '11 at 18:47
  • 7
    @tchrist: That's also cheating. He's asking for regex, not Perl. – SLaks Apr 17 '11 at 18:49
  • 5
    @Slaks, @garm0nboz1a: I have now fully completed the problem per the original specification, without any “cheating” as you have so inelegantly denigrated the effort. A mere `if ($frac =~ $reducible_rx) { ⋯ } else { ⋯ }` is all it takes. If you prefer inverse logic to make an irreducible regex, merely invert the `(*PASS)/(*FAIL)` branches in the pattern. Done. – tchrist Apr 17 '11 at 20:11
  • 5
    But you're still using Perl inside the regex. – SLaks Apr 17 '11 at 20:20
  • 1
    @SLaks: What’s your point? This was not posted in the NON-PERL REGEX category, you know. And the original problem made no such stipulation. Anything that I can place on the right-hand side of an `=~` match operator or **within the delimiters of a `qr{⋯}` quoted regex** is *ipso facto* a regex: I have therefore accomplished everything that was requested and required. If you do not like Perl, that is hardly my problem. If the original poster wanted a non-Perl solution, then he should have specified it. I have met the specifications. Q.E.D. – tchrist Apr 17 '11 at 20:24
  • 1
    @sidyll - When you find an interesting problem you want to solve it - it's like an itch. It isn't really about a few extra points, and this is hardly a good way of getting them, considering the amount of time it probably took. – Kobi Apr 21 '11 at 22:07
  • If I got it right, there is no answer for my question. But that solution can help in very similar problem. So, should I mark it as right answer? – ДМИТРИЙ МАЛИКОВ Apr 23 '11 at 18:49
  • @garm0nboz1a: Gee, I think you should, but then, I am clearly an interested party. :) However, if you look at my reputation, it is not as though I am scrounging for points, so I am not as biased as might otherwise apply. I am glad to help, and I like interesting problems — which this certainly was. The Perl grep program will take patterns like this, too, you know. Of course, in a "real" program, I would separate out the logic better implemented in normal code, but your problem was interesting enough to attract my attentions anyway. – tchrist Apr 23 '11 at 20:08
  • 1
    Shouldn't Tim's answer be marked as the right answer since it is actually the correct answer? – John McDonnell May 06 '11 at 02:00
  • I +1'd for splitting up the horrid regex over multiple lines with comments. There's a first time for everything...and I've never seen anyone do that before. – yurisich Mar 09 '12 at 14:53
13

Nope it cannot be done. Like a good computer scientist I will ignore the specifics of the tool regex and assume you are asking if there is a regular expression. I do not have enough knowledge about regex's features to ensure it is restricted to regular expressions. That caveat aside, on with the show.

Rewording this we get:

Let L be the language {"a/b"| where a and b are natural numbers encoded in a radix r and a and b are coprime}. Is L regular?

Assume such a language is regular. Then there exists a DFA that can decide membership in L. Let N be the number of states of such a DFA. There are an infinite number of primes. As the number of primes is infinite, there are arbitrarily many primes greater than the largest number encodable in N digits in the radix r. (Note: The largest number is clearly r raised to the power of N. I am using this weird wording to show how to accommodate unary.) Select N+1 primes that are greater than this number. All of these numbers are encoded using at least N+1 digits (in the radix r). Enumerate these primes p₀ to pₙ. Let sᵢ be the state of the pᵢ is in immediately after reading the /. By the pigeon hole principle, there are N states and N+1 sᵢ states so there exists at least one pair of indexes (j,k) such that sⱼ = sₖ. So starting from the initial state of the DFA, inputs pₖ/ and pⱼ/ lead to the same state sⱼ (or sₖ) and pⱼ and pₖ are distinct primes.

L must accept all pairs of distinct primes p/q as they are coprime and reject all primes divided by themselves p/p as p is not coprime to p. Now the language accepts pⱼ = pₖ so there is a sequence of states from sⱼ using the string pₖ to an accepting state, call this sequence β. Let α be the sequence of states reading pₖ starting from the initial state. The sequence of states for the DFA starting at the initial state for the string pₖ/pₖ must be the same as α followed by β. This sequence starts in an initial state, goes to sₖ (by reading the input pₖ), and reaches an accepting state by reading pₖ. The DFA accepts pₖ/pₖ and pₖ/pₖ is in L. pₖ is not coprime to pₖ, and therefore pₖ/pₖ is not in L. Contradiction. Therefore the language L is irregular, or no regular expression exists.

ulidtko
  • 14,740
  • 10
  • 56
  • 88
Tim
  • 1,008
  • 5
  • 13
  • To be honest, I'm not convinced by this proof. But the intuition also confirms the conclusion: testing for coprimality seems to require more powerful computational model than DFA's (and thus, classic (academic) regexes) can provide. At the same time, the practical solution above demonstrates that it's not entirely impossible. This wakes **huge** interest for where is the difference between theory and practice, in this case. – ulidtko Jul 23 '13 at 14:52
  • 2
    For those who are interested -- the tool "regex" is strictly more powerful than a classical regular expression, due (at the very least) to the inclusion of backreferences. – Arandur Aug 05 '14 at 21:09
3

If you write the numbers in unary, and use ":" as the division sign, I think this matches reducible fractions:

/^1+:1$|^(11+):\1$|^(11+?)\2+:\2\2+$/

You can then use !~ to find strings that don't match.

Based on this: http://montreal.pm.org/tech/neil_kandalgaonkar.shtml

0

You can know, that a number, ending in (0,5) is divisible by (5), or ending in (2,4,6,8,0) is divisible by 2.

For 3,4,6,7,8,9 as divisors, I wouldn't expect a possibility, and not for arbitrary divisors too.

I guess you know the method, to decide divisibility by 3 - to build the rekursive crosssum, which has to be divisible by 3, to make the number divisible. So there you could eliminate all 3s, 6s and 9s from the number, as well as the 0. For an arbitrary number, you would proceed this way:

  • delete every 0369
  • change 47 to 1, (because 4%3 and 7%3 = 1)
  • change 58 to 2, reason see above
  • change every 2 to 11
  • change every group of 111 to nothing.

If the result is empty, the number was divisible by 3:

echo ${RANDOM}${RANDOM}${RANDOM} | sed 's/[0369]//g;s/[47]/1/g;s/[58]/2/g;s/2/11/g;s/1\{3\}//g'

A similar approach could work for 9, where you have a similar rule. But a general approach for arbitrary divisors?

user unknown
  • 35,537
  • 11
  • 75
  • 121
  • 1
    See my answer for a construction of a DFA for arbitrary divisors. Each divisor *on its own* is regular, but an expression for *all* divisors is impossible. – Welbog Apr 15 '11 at 19:46
  • Yeah, i upvoted the fraction of your post, which I understood. :) – user unknown Apr 15 '11 at 22:20