3

Having numbers like this:

ll <- readLines(textConnection("(412) 573-7777 opt 1
563.785.1655 x1797
(567) 523-1534 x7753
(567) 483-2119 x 477
(451) 897-MALL
(342) 668-6255 ext 7
(317) 737-3377 Opt 4
(239) 572-8878 x 3
233.785.1655 x1776
(138) 761-6877 x 4
(411) 446-6626 x 14
(412) 337-3332x19
412.393.3177 x24
327.961.1757 ext.4"))

What is the regex I should write to get:

xxx-xxx-xxxx

I tried this one:

gsub('[(]([0-9]{3})[)] ([0-9]{3})[-]([0-9]{4}).*','\\1-\\2-\\3',ll)

It doesn't cover all the possibilities. I think I can do it using several regex patterns, but I think it can be done using a single regex.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
agstudy
  • 119,832
  • 17
  • 199
  • 261

1 Answers1

2

If you also want to extract numbers that are represented with letters, you can use the following regex in gsub:

gsub('[(]?([0-9]{3})[)]?[. -]([A-Z0-9]{3})[. -]([A-Z0-9]{4}).*','\\1-\\2-\\3',ll)

See IDEONE demo

You can remove all A-Z from character classes to just match numbers with no letters.

REGEX:

  • [(]? - An optional (
  • ([0-9]{3}) - 3 digits
  • [)]? - An optional )
  • [. -] - Either a dot, or a space, or a hyphen
  • ([A-Z0-9]{3}) - 3 digit or letter sequence
  • [. -] - Either a dot, or a space, or a hyphen
  • ([A-Z0-9]{4}) - 4 digit or letter sequence
  • .* - Any number of characters to the end
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563