0

I'm doing an exercise from C++ Primer

Rewrite your phone program so that it writes only the second and subsequent phone numbers for people with more than one phone number.

(The phone program simply recognises phone-numbers that have a certain format using a regular expression).

The chapter has been discussing using regex_replace and the format flags to alter the format of the phone numbers entered in. The question is asking to ignore the first phone number entered and only format/print the second and subsequent. My input might look something like:

dave: 050 000 0020, (402)2031032, (999) 999-2222

and it should output

402.203.1032 999.999.2222

This is my solution:

#include <iostream>
#include <string>
#include <regex>

using namespace std;
using namespace regex_constants;

int main(){

    string pattern = "(\\()?(\\d{3})(\\))?([-. ])?(\\d{3})([-. ])?(\\d{4})";

    regex r(pattern);

    //string firstFormat = "";
    string secondFormat = "$2.$5.$7 ";

    for(string line; getline(cin, line);){
        unsigned counter = 0;
        for(sregex_iterator b(line.begin(), line.end(), r), e; b != e; ++b)
            if(++counter > 1) cout << (*b).format(secondFormat);
        cout << endl;

//      Below: iterates through twice, maybe not ideal
//      string noFirst = regex_replace(line, r, firstFormat, format_first_only); //removes the first phone number
//      cout << regex_replace(noFirst, r, secondFormat, format_no_copy) << endl;
    }


}

However I am unhappy with the use of a counter to make sure I'm not processing the first match. It feels like there must be a more natural utility (like the format_first_only flag that can be passed to format, except in reverse) that makes it possible to ignore the first match? But I am struggling to find one.

The commented out solution seems a bit better except it requires a second iteration through the input.

SergeantPenguin
  • 843
  • 7
  • 16

2 Answers2

0

How about change regex to be something like (?<=\P, *)(\P) (where \P is shorthand for a regex which matches a phone number). In other words, you are interested only in phone numbers which follow a previous phone number.

The only problem with this suggestion is that C++ doesn't appear to support positive look-behind.

(Note: you don't want all the captures in the first phone number.)

  • Kind of works. Except if I have three phone numbers (e.g input `a, b, c`, where `a` `b` and `c` are phone numbers) then it outputs just `b` instead of `b c`. This must be because the second iterator starts just after `b`, and so `b` isn't considered "previous " to `c`. – SergeantPenguin Dec 12 '15 at 19:41
  • That's "Kind of works" in the sense: "Doesn't work at all"! – Martin Bonner supports Monica Dec 13 '15 at 12:51
0

You could use the \G anchor.

"(?:(?!\\A)\\G|.*?\\d{3}\\D*\\d{3}\\D*\\d{4}).*?(\\d{3})\\D*(\\d{3})\\D*(\\d{4})"

And secondFormat = "$1.$2.$3 ";
Where there is no need for a counter.

Formatted:

 (?:
      (?! \A )                      # Not beginning of string
      \G                            # End of previous match
   |                              # or,
      .*?                           # Anything up to
      \d{3} \D* \d{3} \D* \d{4}     # First phone number
 )
 .*?                           # Anything up to
 ( \d{3} )                     # (1), Next phone number
 \D* 
 ( \d{3} )                     # (2)
 \D* 
 ( \d{4} )                     # (3)

Input:

dave: 050 000 0020, (402)2031032, (999) 999-2221

Output:

 **  Grp 0 -  ( pos 0 , len 32 ) 
dave: 050 000 0020, (402)2031032  
 **  Grp 1 -  ( pos 21 , len 3 ) 
402  
 **  Grp 2 -  ( pos 25 , len 3 ) 
203  
 **  Grp 3 -  ( pos 28 , len 4 ) 
1032  

-------------------------------------

 **  Grp 0 -  ( pos 32 , len 16 ) 
, (999) 999-2221  
 **  Grp 1 -  ( pos 35 , len 3 ) 
999  
 **  Grp 2 -  ( pos 40 , len 3 ) 
999  
 **  Grp 3 -  ( pos 44 , len 4 ) 
2221  
  • Thanks. One thing though. If there was a fourth number in that input, say (000)1112222. Then the second iteration would be looking at the remaining `, (999) 999-2221, (000)1112222`. In the first subexpresssion `(?:(?!\\A)\\G|.*?\\d{3}\\D*\\d{3}\\D*\\d{4}).*?`, how is it that the second half of the separator would not provide a suitable match too? It seems to me like the second iteration could match `, (999) 999-2221` or the entire remainder `, (999) 999-2221, (000)1112222`. – SergeantPenguin Dec 12 '15 at 22:39
  • (I also can't seem to get the second output [here](http://coliru.stacked-crooked.com/a/80c9570e180a0567)) What compiler and version are you using? I'm on gcc 5.2.0. – SergeantPenguin Dec 12 '15 at 23:04
  • @SergeantPenguin - I'm using VS2010, but I don't have c++11. This regex assumes the engine you use supports the `\G` anchor, which is a Perl/PCRE or Boost::regex construct, which I assumed C++11 uses. On your first question, the regex would match a fourth, fifth, sixth, ... as many as you have. –  Dec 13 '15 at 01:17