0

So basically I want to reformat a 10 digit number like so:

1234567890 --> (123) 456-7890

A long way to do this would be to have each number be its own capture group and then back-reference each one individually:

'([0-9])([0-9])...([0-9])' --> (\1\2\3) \4\5\6-\7\8\9\10

This seems unnecessary and verbose, but when I try the following

'([0-9]){10}'

There appears to be only one back-reference and its of the last digit in the number.

Is there is a more elegant way to reference each character as its own capture group?

Thanks!

Ezra Goss
  • 124
  • 8
  • 1
    See https://regex101.com/r/iAlKbG/1 – Wiktor Stribiżew Jul 08 '17 at 21:49
  • 2
    It isn't the first basic regex question you asked. Take the time to read a regex tutorial. – Casimir et Hippolyte Jul 08 '17 at 21:55
  • My question isn't really how to format this, although the answer given is more elegant than what I put as my long answer. My question is more about using the syntax for one capture group for multiple instances. But I can assume from the answers that were given that that isn't possible. I looked over documentation before asking this but I'll try and be more thorough in the future. Sorry about that – Ezra Goss Jul 08 '17 at 21:58
  • 1
    Training is the key. – Casimir et Hippolyte Jul 08 '17 at 21:59
  • @erza It depends, I just don't really understand what you mean by *one capture group for multiple instances*. Instances of what? – Antoine C. Jul 08 '17 at 21:59
  • If you look at the answer I tried that failed, I basically put the digit in a capture group and then tried to set the amount of times that capture group matches to 10. I guess another way of phrasing my question is "Can I match multiple instances using one capture group, and then back-reference each of them separately?". In this case _the number of instances_ is defined as the number within the {}. – Ezra Goss Jul 08 '17 at 22:05
  • 2
    Take a look at this: http://www.regular-expressions.info/captureall.html – Marco Luzzara Jul 08 '17 at 22:09
  • Yeah that's a great explanation of why this won't work. Thanks for the link – Ezra Goss Jul 08 '17 at 22:11
  • 1
    @erza Well yes this is not possible. Such an behavior needs to be created with the programming language you use with your regex implementation. I advise you to read the accepted answer of this question: https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups. – Antoine C. Jul 08 '17 at 23:03

2 Answers2

2

The following pattern will do the job: ^(\d{3})(\d{3})(\d{4})$

  • ^(\d{3}): beginning of the string, then exactly 3 digits
  • (\d{3}): exactly 3 digits
  • (\d{4})$: exactly 4 digits, then end of the string.

Then replace by: (\1) \2-\3

Antoine C.
  • 3,730
  • 5
  • 32
  • 56
  • I switched the answer because it better addressed my question, but this answer does actually provide the solution to my problem. Wasn't sure at first which one to go with but because I was asking a more general question, I went with the more general answer. Thank you though! – Ezra Goss Jul 08 '17 at 23:13
  • 1
    No problem, glad you understood! – Antoine C. Jul 08 '17 at 23:14
1

Although the other answer with its example regex patterns hopefully shed light on the correct application of capture groups, it does not directly answer the question. If you fail to understand how regular expressions work (capture groups in particular), you may find yourself wanting to do the same thing with a different pattern in the future.

Is there is a more elegant way to reference each character as its own capture group?

The initial answer is "No", there is no way to reference an individual capture of a single capture group using traditional replacement syntax - regardless of whether it is a single digit or any other capture group. Consider that you indicate a precise number of matches with {10} and it seems perfectly reasonable to be able to access each capture. But what if you had indicated a variable number of matches with + or {,3}? There would be no well-defined way of knowing how many possible captures occurred. If the same regex pattern had had more capture groups following the "repeated" capture group, there would be no way of correctly referencing the later groups. Example: Given the pattern ([a-z])+(\d){3}, the first capture group could match 4 letters one time, then the next time match 11 letters. If you wanted to refer to the captured digits, how would you do that? You could not, since \1, \2, \3, ... would all be reserved for possible capture instances of the first group.

But the inability of basic regular expressions syntax to do what you want does not remove the validity of your question, nor does it necessarily place the solution outside the realm of many regular expression implementations. Various regex implementations (i.e. language syntax and regex libraries) resolve this limitation by facilitating regex matching with various objects for accessing repeated captures. (c# and .Net regex library is one example, like match.Groups[1].Captures[3]) So even though you can't use basic replacement patterns to get want you want, the answer is often "Yes", depending on the specific implementation.

C Perkins
  • 3,733
  • 4
  • 23
  • 37
  • This does directly answer my question and is more in line with what I was looking for, thank you. I've made this the new answer as it is the only answer addressing my question vs. providing a better formatting pattern – Ezra Goss Jul 08 '17 at 23:12
  • The general tendency in answering regex questions, especially basic questions, is to respond with references to very basic, fundamental regex syntax. That is of course necessary, but neglected too often is the rich set of facilities available beyond the basic syntax. – C Perkins Jul 08 '17 at 23:19