43

Hey there, I'm trying to perform a backwards regular expression search on a string to divide it into groups of 3 digits. As far as i can see from the AS3 documentation, searching backwards is not possible in the reg ex engine.

The point of this exercise is to insert triplet commas into a number like so:

10000000 => 10,000,000

I'm thinking of doing it like so:

string.replace(/(\d{3})/g, ",$1")

But this is not correct due to the search not happening from the back and the replace $1 will only work for the first match.

I'm getting the feeling I would be better off performing this task using a loop.

UPDATE:

Due to AS3 not supporting lookahead this is how I have solved it.

public static function formatNumber(number:Number):String
{
    var numString:String = number.toString()
    var result:String = ''

    while (numString.length > 3)
    {
        var chunk:String = numString.substr(-3)
        numString = numString.substr(0, numString.length - 3)
        result = ',' + chunk + result
    }

    if (numString.length > 0)
    {
        result = numString + result
    }

    return result
}
BefittingTheorem
  • 10,459
  • 15
  • 69
  • 96

12 Answers12

59

If your language supports postive lookahead assertions, then I think the following regex will work:

(\d)(?=(\d{3})+$)

Demonstrated in Java:

import static org.junit.Assert.assertEquals;

import org.junit.Test;

public class CommifyTest {

    @Test
    public void testCommify() {
        String num0 = "1";
        String num1 = "123456";
        String num2 = "1234567";
        String num3 = "12345678";
        String num4 = "123456789";

        String regex = "(\\d)(?=(\\d{3})+$)";

        assertEquals("1", num0.replaceAll(regex, "$1,"));
        assertEquals("123,456", num1.replaceAll(regex, "$1,"));
        assertEquals("1,234,567", num2.replaceAll(regex, "$1,"));
        assertEquals("12,345,678", num3.replaceAll(regex, "$1,"));
        assertEquals("123,456,789", num4.replaceAll(regex, "$1,"));    
    }    
}
isherwood
  • 58,414
  • 16
  • 114
  • 157
toolkit
  • 49,809
  • 17
  • 109
  • 135
  • 3
    I prefer this, assuming that you can use lookbehinds: (?<=\d)(?=(\d{3})+$) That way, you can simply replace with "," instead of replacing with "\1,". – Bravery Onions Jun 23 '09 at 19:56
24

Found on http://gskinner.com/RegExr/

Community > Thousands separator

Pattern: /\d{1,3}(?=(\d{3})+(?!\d))/g

Replace: $&,

trace ( String("1000000000").replace( /\d{1,3}(?=(\d{3})+(?!\d))/g , "$&,") );

It done the job!

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Thomas
  • 241
  • 2
  • 2
  • Just as a helpful tip for anyone looking in the future, a slight variation of the regex just above that I had to figure out is: `/\d{1,3}(?=(\d{3})+(?=\.))/g` This will format high-precision numbers, such as 4517534.24658 without adding commas _after_ the decimal. This does, of course, require the number have a decimal in it to work properly (which just so happened to be true in my case). :-) – BIG DOG Feb 03 '16 at 21:11
  • 1
    You can prepend the original with a negative lookbehind, `(?<!\.)`, to stop it comma-ing without requiring a decimal, too – John Neuhaus Nov 22 '19 at 20:19
9

If your regex engine has positive lookaheads, you could do something like this:

string.replace(/(\d)(?=(\d\d\d)+$)/, "$1,")

Where the positive lookahead (?=...) means that the regex only matches when the lookahead expression ... matches.

(Note that lookaround-expressions are not always very efficient.)

Niki
  • 15,662
  • 5
  • 48
  • 74
4

While many of these answers work fine with positive integers, many of their argument inputs are cast as Numbers, which implies that they can handle negative values or contain decimals, and here all of the solutions fail. Though the currently selected answer does not assume a Number I was curious to find a solution that could and was also more performant than RegExp (which AS3 does not do well).

I put together many of the answers here in a testing class (and included a solution from this blog and an answer of my own called commaify) and formatted them in a consistent way for easy comparison:

package
{
    public class CommaNumberSolutions
    {   
        public static function commaify( input:Number ):String
        {
            var split:Array = input.toString().split( '.' ),
                front:String = split[0],
                back:String = ( split.length > 1 ) ? "." + split[1] : null,
                n:int = input < 0 ? 2 : 1,
                commas:int = Math.floor( (front.length - n) / 3 ),
                i:int = 1;

            for ( ; i <= commas; i++ )
            {
                n = front.length - (3 * i + i - 1);
                front = front.slice( 0, n ) + "," + front.slice( n );
            }

            if ( back )
                return front + back;
            else
                return front;
        }

        public static function getCommaString( input:Number ):String
        {
            var s:String = input.toString();

            if ( s.length <= 3 )
                return s;

            var i:int = s.length % 3;

            if ( i == 0 )
                i = 3;

            for ( ; i < s.length; i += 4 )
            {
                var part1:String = s.substr(0, i);
                var part2:String = s.substr(i, s.length);
                s = part1.concat(",", part2);
            }

            return s;
        }

        public static function formatNumber( input:Number ):String
        {
            var s:String = input.toString()
            var result:String = ''

            while ( s.length > 3 )
            {
                var chunk:String = s.substr(-3)
                s = s.substr(0, s.length - 3)
                result = ',' + chunk + result
            }

            if ( s.length > 0 )
                result = s + result

            return result
        }

        public static function commaCoder( input:Number ):String
        {
            var s:String = "";
            var len:Number = input.toString().length;

            for ( var i:int = 0; i < len; i++ )
            { 
                if ( (len-i) % 3 == 0 && i != 0)
                    s += ",";

                s += input.toString().charAt(i);
            }
            return s;
        }

        public static function regex1( input:Number ):String
        {
            return input.toString().replace( /-{0,1}(\d)(?=(\d\d\d)+$)/g, "$1," );
        }

        public static function regex2( input:Number ):String
        {
            return input.toString().replace( /-{0,1}\d{1,3}(?=(\d{3})+(?!\d))/g , "$&,")
        }

        public static function addCommas( input:Number ):String
        {
            var negative:String = "";
            if ( input < 0 )
            {
                negative = "-";
                input = Math.abs(input);
            }

            var s:String = input.toString();
            var results:Array = s.split(/\./);
            s = results[0];

            if ( s.length > 3 )
            {
                var mod:Number = s.length % 3;
                var output:String = s.substr(0, mod);
                for ( var i:Number = mod; i < s.length; i += 3 )
                {
                    output += ((mod == 0 && i == 0) ? "" : ",") + s.substr(i, 3);
                }

                if ( results.length > 1 )
                {
                    if ( results[1].length == 1 )
                        return negative + output + "." + results[1] + "0";
                    else
                        return negative + output + "." + results[1];
                }
                else
                    return negative + output;
            }
            if ( results.length > 1 )
            {
                if ( results[1].length == 1 )
                    return negative + s + "." + results[1] + "0";
                else
                    return negative + s + "." + results[1];
            }
            else
                return negative + s;
        }
    }
}

Then I tested each for accuracy and performance:

package
{    
    public class TestCommaNumberSolutions
    {
        private var functions:Array;

        function TestCommaNumberSolutions()
        {
            functions = [
                { name: "commaify()", f: CommaNumberSolutions.commaify },
                { name: "addCommas()", f: CommaNumberSolutions.addCommas },
                { name: "getCommaString()", f: CommaNumberSolutions.getCommaString },
                { name: "formatNumber()", f: CommaNumberSolutions.formatNumber },
                { name: "regex1()", f: CommaNumberSolutions.regex1 },
                { name: "regex2()", f: CommaNumberSolutions.regex2 },
                { name: "commaCoder()", f: CommaNumberSolutions.commaCoder }
            ];
            verify();
            measure();
        }

        protected function verify():void
        {
            var assertions:Array = [ 
                { input: 1, output: "1" },
                { input: 21, output: "21" },
                { input: 321, output: "321" },
                { input: 4321, output: "4,321" },
                { input: 54321, output: "54,321" },
                { input: 654321, output: "654,321" },
                { input: 7654321, output: "7,654,321" },
                { input: 987654321, output: "987,654,321" },
                { input: 1987654321, output: "1,987,654,321" },
                { input: 21987654321, output: "21,987,654,321" },
                { input: 321987654321, output: "321,987,654,321" },
                { input: 4321987654321, output: "4,321,987,654,321" },
                { input: 54321987654321, output: "54,321,987,654,321" },
                { input: 654321987654321, output: "654,321,987,654,321" },
                { input: 7654321987654321, output: "7,654,321,987,654,321" },
                { input: 87654321987654321, output: "87,654,321,987,654,321" },
                { input: -1, output: "-1" },
                { input: -21, output: "-21" },
                { input: -321, output: "-321" },
                { input: -4321, output: "-4,321" },
                { input: -54321, output: "-54,321" },
                { input: -654321, output: "-654,321" },
                { input: -7654321, output: "-7,654,321" },
                { input: -987654321, output: "-987,654,321" },
                { input: -1987654321, output: "-1,987,654,321" },
                { input: -21987654321, output: "-21,987,654,321" },
                { input: -321987654321, output: "-321,987,654,321" },
                { input: -4321987654321, output: "-4,321,987,654,321" },
                { input: -54321987654321, output: "-54,321,987,654,321" },
                { input: -654321987654321, output: "-654,321,987,654,321" },
                { input: -7654321987654321, output: "-7,654,321,987,654,321" },
                { input: -87654321987654321, output: "-87,654,321,987,654,321" },
                { input: .012345, output: "0.012345" },
                { input: 1.012345, output: "1.012345" },
                { input: 21.012345, output: "21.012345" },
                { input: 321.012345, output: "321.012345" },
                { input: 4321.012345, output: "4,321.012345" },
                { input: 54321.012345, output: "54,321.012345" },
                { input: 654321.012345, output: "654,321.012345" },
                { input: 7654321.012345, output: "7,654,321.012345" },
                { input: 987654321.012345, output: "987,654,321.012345" },
                { input: 1987654321.012345, output: "1,987,654,321.012345" },
                { input: 21987654321.012345, output: "21,987,654,321.012345" },
                { input: -.012345, output: "-0.012345" },
                { input: -1.012345, output: "-1.012345" },
                { input: -21.012345, output: "-21.012345" },
                { input: -321.012345, output: "-321.012345" },
                { input: -4321.012345, output: "-4,321.012345" },
                { input: -54321.012345, output: "-54,321.012345" },
                { input: -654321.012345, output: "-654,321.012345" },
                { input: -7654321.012345, output: "-7,654,321.012345" },
                { input: -987654321.012345, output: "-987,654,321.012345" },
                { input: -1987654321.012345, output: "-1,987,654,321.012345" },
                { input: -21987654321.012345, output: "-21,987,654,321.012345" }
            ];

            var i:int;
            var len:int = assertions.length;
            var assertion:Object;
            var f:Function;
            var s1:String;
            var s2:String;

            for each ( var o:Object in functions )
            {
                i = 0;
                f = o.f;
                trace( '\rVerify: ' + o.name ); 
                for ( ; i < len; i++ )
                {
                    assertion = assertions[ i ];
                    s1 = f.apply( null, [ assertion.input ] );
                    s2 = assertion.output;
                    if ( s1 !== s2 )
                        trace( 'Test #' + i + ' Failed: ' + s1 + ' !== ' + s2 );
                }
            }

        }

        protected function measure():void
        {
            // Generate random inputs
            var values:Array = [];
            for ( var i:int = 0; i < 999999; i++ ) {
                values.push( Math.random() * int.MAX_VALUE * ( Math.random() > .5 ? -1 : 1) );
            }

            var len:int = values.length;
            var stopwatch:Stopwatch = new Stopwatch;
            var s:String;
            var f:Function;
            trace( '\rTesting ' + len + ' random values' );
            // Test each function
            for each ( var o:Object in functions )
            {
                i = 0;
                s = "";
                f = o.f;
                stopwatch.start();
                for ( ; i < len; i++ ) {
                    s += f.apply( null, [ values[i] ] ) + " ";
                }
                stopwatch.stop();
                trace( o.name + '\t\ttook ' + (stopwatch.elapsed/1000) + 's' ); //(stopwatch.elapsed/len) + 'ms'
            }
        }
    }
}

import flash.utils.getTimer;

class Stopwatch
{
    protected var startStamp:int;
    protected var stopStamp:int;
    protected var _started:Boolean;
    protected var _stopped:Boolean;

    function Stopwatch( startNow:Boolean = true ):void
    {
        if ( startNow ) 
            start();
    }

    public function start():void
    {
        startStamp = getTimer();
        _started = true;
        _stopped = false;
    }

    public function stop():void
    {
        stopStamp = getTimer();
        _stopped = true;
        _started = false;
    }

    public function get elapsed():int
    {
        return ( _stopped ) ? stopStamp - startStamp : ( _started ) ? getTimer() - startStamp : 0;
    }

    public function get started():Boolean
    {
        return _started;
    }

    public function get stopped():Boolean
    {
        return _stopped;
    }
}

Because of AS3's lack of precision with larger Numbers every class failed these tests:

Test #15 Failed: 87,654,321,987,654,320 !== 87,654,321,987,654,321
Test #31 Failed: -87,654,321,987,654,320 !== -87,654,321,987,654,321
Test #42 Failed: 21,987,654,321.012344 !== 21,987,654,321.012345
Test #53 Failed: -21,987,654,321.012344 !== -21,987,654,321.012345

But only two functions passed all of the other tests: commaify() and addCommas().

The performance tests show that commaify() is the most preformant of all the solutions:

Testing 999999 random values
commaify()        took 12.411s
addCommas()       took 17.863s
getCommaString()  took 18.519s
formatNumber()    took 14.409s
regex1()          took 40.654s
regex2()          took 36.985s

Additionally commaify() can be extended to including arguments for decimal length and zero-padding on the decimal portion — it also outperforms the others at 13.128s:

public static function cappedDecimal( input:Number, decimalPlaces:int = 2 ):Number
{
    if ( decimalPlaces == 0 ) 
        return Math.floor( input );

    var decimalFactor:Number = Math.pow( 10, decimalPlaces );

    return Math.floor( input * decimalFactor ) / decimalFactor;
}

public static function cappedDecimalString( input:Number, decimalPlaces:int = 2, padZeros:Boolean = true ):String
{
    if ( padZeros )
        return cappedDecimal( input, decimalPlaces ).toFixed( decimalPlaces );
    else
        return cappedDecimal( input, decimalPlaces ).toString();
}

public static function commaifyExtended( input:Number, decimalPlaces:int = 2, padZeros:Boolean = true ):String
{
   var split:Array = cappedDecimalString( input, decimalPlaces, padZeros ).split( '.' ),
       front:String = split[0],
       back:String = ( split.length > 1 ) ? "." + split[1] : null,
       n:int = input < 0 ? 2 : 1,
       commas:int = Math.floor( (front.length - n) / 3 ),
       i:int = 1;

   for ( ; i <= commas; i++ )
   {
       n = front.length - (3 * i + i - 1);
       front = front.slice( 0, n ) + "," + front.slice( n );
   }

   if ( back )
       return front + back;
   else
       return front;
}

So, I'd offer that commaify() meets the demands of versatility and performance though certainly not the most compact or elegant.

Mark Fox
  • 8,694
  • 9
  • 53
  • 75
2

This really isn't the best use of RegEx... I'm not aware of a number formatting function, but this thread seems to provide a solution.

function commaCoder(yourNum):String {
    //var yourNum:Number = new Number();
    var numtoString:String = new String();
    var numLength:Number = yourNum.toString().length;
    numtoString = "";

    for (i=0; i<numLength; i++) { 
        if ((numLength-i)%3 == 0 && i != 0) {
            numtoString += ",";
        }
        numtoString += yourNum.toString().charAt(i);
        trace(numtoString);
    }
    return numtoString;
}

If you really are insistent on using RegEx, you could just reverse the string, apply the RegEx replace function, then reverse it back.

Noldorin
  • 144,213
  • 56
  • 264
  • 302
  • I've no special need for a RegEx solution, i was more wondering how it could be approached using regex. But it seems that it is not the sort of problem regex lends itself too, especially with the case of: 100000000 => ,100,000,000. I wouldn't know where to start a regex to that into account – BefittingTheorem Apr 06 '09 at 13:24
  • 1
    But this particular problem *can* be solved with a regex, and without reversing the string first. Niki and toolkit show how it's done. – Alan Moore Apr 06 '09 at 13:50
  • @Alan: Indeed it can be done... though please don't advocate it! Saying that, I think the OP understands that it's not a very appropiate use of RegEx. – Noldorin Apr 06 '09 at 15:39
  • But how is anyone supposed to learn regexes if not by practicing on small, self-contained problems like this one? It makes for a nice little exercise. – Alan Moore Apr 06 '09 at 16:15
  • I suppose it does, as long as one is cautious about their utility. No object there, really. Nonetheless, there are plenty of very practical regexes which one could practice writing. – Noldorin Apr 06 '09 at 16:37
  • Why not use a regex? It's extremly easy to test, IMO it's easier to understand than pure Java code (because it's declarative instead of imperative), and the chances of nasty errors in case of malformed data is lower. – Niki Apr 06 '09 at 20:45
  • @Niki: It's unnecessarily obfuscated for a start? And also, I would bet on it being *hugely* slower. – Noldorin Apr 06 '09 at 23:19
  • It's true that a regex-based solution can never be quite as fast as a hand-coded solution can be (in Java, anyway), but that's no reason to reject regexes without even trying them. In this case, you would have to be processing millions of strings in a tight loop to even notice the difference. – Alan Moore Apr 07 '09 at 03:18
  • @Noldorin: 1. Why do you think it's _hugely_ slower? Regexes can be compiled, and I'm sure they avoid typical performance pitfalls like concatenating strings in a loop. 2. It says "look for a digit followed by any number of 3 digit blocks" - I'd say that's a lot clearer than equivalent Java/C# code – Niki Apr 07 '09 at 07:33
  • @Niki: Guess it's a matter of personal preference. I just see that the average coder has a tendency to use regexes for almost anything they possibly can, and I've become rather skeptical. – Noldorin Apr 07 '09 at 12:23
  • (contd.) At least for me, regexes in most cases take a lot longer to interpret and modify, and I'm fairly experienced with them! I was also aware of lookahead assertions, but thought I'd stick to the simple solution. And nonetheless, I would bet regexes are still slower, even when compiled. – Noldorin Apr 07 '09 at 12:25
2

A sexeger is good for this. In brief, a sexeger is a reversed regex run against a reversed string that you reverse the output of. It is generally more efficient than the alternative. Here is some pseudocode for what you want to do:

string = reverse string
string.replace(/(\d{3})(?!$)/g, "$1,")
string = reverse string

Here is is a Perl implemntation

#!/usr/bin/perl

use strict;
use warnings;

my $s = 13_456_789;

for my $n (1, 12, 123, 1234, 12345, 123456, 1234567) {
    my $s = reverse $n;
    $s =~ s/([0-9]{3})(?!$)/$1,/g;
    $s = reverse $s;
    print "$s\n";
}
Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
  • Thanks Chas, just as a POI, how would I take this situation into account: 100000000 => ,100,000,000. Or is this even possible with regex? – BefittingTheorem Apr 06 '09 at 13:25
  • Hmm, a zero-width negative look-behind just shifts the position of the comma, and trying to do a normal regex with a zero-width negative look-ahead only works for groups that are multiples of three. – Chas. Owens Apr 06 '09 at 13:47
  • I think toolkit has it with a zero-width positive look-ahead – Chas. Owens Apr 06 '09 at 13:49
  • As Brian pointed out, your technique puts a comma at the beginning of the string if the first group consists of three digits. I would add a positive lookahead for a digit to make sure I was still inside the number: /(\d{3})(?=\d)/g – Alan Moore Apr 06 '09 at 14:32
  • Thanks guys, so in general it seems that a regex solution is going down an overly complex road :D – BefittingTheorem Apr 07 '09 at 19:32
0

You may want to consider NumberFormatter

adamkonrad
  • 6,794
  • 1
  • 34
  • 41
0

I'll take the downvotes for not being the requested language, but this non-regex technique should apply (and I arrived here via searching for "C# regex to add commas into number")

var raw = "104241824    15202656 KB 13498560 KB 1612672KB already 1,000,000 or 99.999 or 9999.99";

int i = 0;
bool isnum = false;
var formatted = raw.Reverse().Aggregate(new StringBuilder(), (sb, c) => {
    //$"{i}: [{c}] {isnum}".Dump();
    
    if (char.IsDigit(c) && c != ' ' && c!= '.' && c != ',') {
        if (isnum) {
            if (i == 3) {
                //$"ins ,".Dump();
                sb.Insert(0, ',');
                i = 0;
            }
        }
        else isnum = true;
        i++;
    }
    else {
        isnum = false;
        i = 0;
    }
    
    sb.Insert(0, c);
    return sb;
});

results in:

104,241,824 15,202,656 KB 13,498,560 KB 1,612,672KB already 1,000,000 or 99.999 or 9,999.99

drzaus
  • 24,171
  • 16
  • 142
  • 201
-1

// This is a simple code and it works fine...:)

import java.util.Scanner;

public class NumberWithCommas {

    public static void main(String a[]) {
        Scanner sc = new Scanner(System.in);

        String num;

        System.out.println("\n enter the number:");

        num = sc.next();

        printNumber(num);
    }

    public static void printNumber(String ar) {
        int len, i = 0, temp = 0;
        len = ar.length();
        temp = len / 3;
        if (len % 3 == 0)
            temp = temp - 1;
        len = len + temp;
        char[] ch = ar.toCharArray();
        char[] ch1 = new char[len];
        for (int j = 0, k = (ar.length() - 1); j < len; j++)
        {
            if (i < 3)
            {
                ch1[j] = ch[k];
                i++;
                k--;
            }
            else
            {
                ch1[j] = ',';
                i = 0;
            }
        }
        for (int j = len - 1; j >= 0; j--)
            System.out.print(ch1[j]);
        System.out.println("");
    }
}
Stephan
  • 41,764
  • 65
  • 238
  • 329
mrinal
  • 1
-1

If you can't use lookahead on regular expressions, you can use this:

string.replace(/^(.*?,)?(\d{1,3})((?:\d{3})+)$/, "$1$2,$3")

inside a loop until there's nothing to replace.

For example, a perlish solution would look like this:

my $num = '1234567890';
while ($num =~ s/^(.*?,)?(\d{1,3})((?:\d{3})+)$/$1$2,$3/) {}
Julio
  • 5,208
  • 1
  • 13
  • 42
-1

Perl RegExp 1 liner:

1 while $VAR{total} =~ s/(.*\d)(\d\d\d)/$1,$2/g;

Y.K.
  • 290
  • 2
  • 9
-2

Try this code. it's simple and best performance.

var reg:RegExp=/\d{1,3}(?=(\d{3})+(?!\d))/g;
var my_num:Number = 48712694;
var my_num_str:String = String(my_num).replace(reg,"$&,");
trace(my_num_str);

::output::

48,712,694
Ravi Allam
  • 20
  • 5