1

This is what I am trying to do :

Given a string, 12345678, commify(str) should give me 12,345,678.

The problem is to be solved using regex with Perl, and the solution that works for this problem is this :

s/(?<=\d)(?=(\d\d\d)+(?!\d))/,/g

Source : Mastering Regular Expressions

The problem I am having in understanding this is how we are able to capture the "345" part of this string. One thing that I could think about is that the regex pointer "i" ( this is how I am visualising it ) starts at 1, another pointer "j" traverses the entire string and finds the appropriate location between 5 and 6. Then i moves to 2. "j" traverses the entire string again, finds the appropriate position between 2 and 3(since now a comma has been inserted between 5 and 6). Is my understanding correct ? If not, could anybody help me to visualise this process ?

Note : I have found similar questions but they don't seem to explain how the problem is solved but rather state the exact answer.

surya
  • 253
  • 3
  • 9
  • Possible duplicate of [Insert commas into number string](https://stackoverflow.com/questions/721304/insert-commas-into-number-string) – Sebastian Proske Jan 15 '18 at 11:49
  • Not a duplicate. The linked question just asks *how* to do it. This question starts with the regex provided in answers to the other question and asks *why* the regex works. – Dave Sherohman Jan 15 '18 at 12:31
  • It doesn't help that your regex is *wrong*. That substitution leaves the string `12,`. Your second (capturing) group should be a look-ahead instead to avoid deleting digits. – Borodin Jan 15 '18 at 12:58
  • @Borodin, Sorry , edited. – surya Jan 15 '18 at 13:03

3 Answers3

6
(?<=\d)(\d\d\d)+(?!\d)

how it works, reading from the right :

  • (?!\d) lookahead assertion ensures there is no digit just after this point, the cursor (in input) is is just after the last digit
  • (\d\d\d)+ matches 1 or more groups of three digits
  • (?<=\d) ensures there is still a digit before the first digit of a group

update about backtracking: taking 123456789

first engine starts whenever it find a digit before cursor : after 1

1.23456789

then it tries to match at least 1 and as many group of three digits

1.234.567.89

after 89 it fails to find a third digit, also backtracking it can't match because of negative lookahead, so it backtracks at the begining and goes to the following character : 2

12.345.678.9

again it fails to find a second digit within a group of three so it goes to 3

123.456.789

now there is no more digit so it matches.

Note that the worst case is when numbre is a multiple of 3, and it is what it is done for each replacement because the lookahead not move forward the input cursor.

including perl one-liner

perl -pe 's/(?<=\d)(?=(\d{3}(?{print "matched $&.\n"}))+(?!\d(?{print "failed: $&.\n"})))/,/g'  <<<123456789
Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36
  • My bad. Corrected – surya Jan 15 '18 at 11:37
  • But by that logic, shouldn't only match 678 ( the last 3 digits ) ? The cursor should be at the position between 5 and 6 to insert the comma. As i understand, the regex pointer should only move forward from here. Is that not right ? – surya Jan 15 '18 at 11:42
  • i've just seen that [regex101](https://regex101.com/r/vlHBeB/1/) provides also a debugger maybe can help clicking on regex debugger below tools – Nahuel Fouilleul Jan 15 '18 at 11:44
  • at (\d\d\d)+ (?!\d), regex engine can choose between two paths : if next character is a digit (it must match three if there's less than three it must backtrack), else (it's not a digit) match is done – Nahuel Fouilleul Jan 15 '18 at 12:09
  • The text I was referencing did not introduce backtrack , so thank you for that. I am referencing https://stackoverflow.com/questions/9011592/in-regular-expressions-what-is-a-backtracking-back-referencing . Backtracking should happen when the regex fails to find a match. However, the regex would find a match at the very end ( last 3 digits ) and should stop. Why did it go back ? – surya Jan 15 '18 at 13:06
3

Adding thousands-separating commas to a number is inherently the kind of thing which is easier to do from right to left, because one counts in this direction. The algorithm for the reversed number is simple: replace every group of three numbers by those three numbers followed by a comma. In the code snippet below I use this reverse trick, and then reverse the amended number again at the end.

$s = "12345678";
print $s . "\n";
$t = reverse $s;
$t =~ s/(\d{3})/${1},/g;
$s = reverse $t;
print $s;

12345678
12,345,678

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
3

This is simple way to get the data

$num=12345678;
$num =~ s/(\d)(?=(\d{3})+(\D|$))/$1\,/g;
print $num;
San
  • 226
  • 5
  • 14