0

I am using a regex that is suggested here to repleace any type of phone numbers with aaaaaaaaaa. This a snapshot of my data :

df <- data.frame(
  text = c(
    'my number is (123)-416-567',
    "1 321 124 7889 is valid",
    'why not taking 987-012-6782',
    '120 967 3256 is correct',
    'call at 888 969 9919',
    'please text at 1 647 989 1213'
  )
)

df %>% select(text)

                           text
1    my number is (123)-416-567
2       1 321 124 7889 is valid
3   why not taking 987-012-6782
4       120 967 3256 is correct
5          call at 888 969 9919
6 please text at 1 647 989 1213

My code is

df %>% 
  mutate(
    text = str_replace_all(text, '^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}$', 'aaaaaaaaaa')
  )

and I get this error

Error: '\+' is an unrecognized escape in character string starting "'^(\+"
Error: unexpected ')' in "  )"

The outcome should be like

                           text
1           my number is aaaaaaaaaa
2           aaaaaaaaaa is valid
3           why not taking aaaaaaaaaa
4           aaaaaaaaaa is correct
5          call at aaaaaaaaaa
6          please text at  aaaaaaaaaa
Phil
  • 7,287
  • 3
  • 36
  • 66
Alex
  • 245
  • 1
  • 7
  • 1
    Try ``'(?:\\+?\\d{1,2}\\s)?\\(?\\d{3}\\)?[\\s.-]\\d{3}[\\s.-]\\d{3,4}(?!\\d)'`` – Wiktor Stribiżew Jan 27 '23 at 18:05
  • 1
    @WiktorStribiżew thank you! it worked. can you please tell me what is wrong with what I wrote? it seems to be recommend by the members here. Much appreciated! – Alex Jan 27 '23 at 18:10
  • Does this answer your question? [R - gsub replacing backslashes](https://stackoverflow.com/questions/27491986/r-gsub-replacing-backslashes) - _"The reason that you need four backslashes to represent one literal backslash is that `"\"` is an escape character in both R strings and for the regex engine to which you're ultimately passing your patterns. If you were talking directly to the regex engine, you'd use `"\\"` to indicate a literal backslash. But in order to get R to pass `"\\"` on to the regex engine, you need to type `"\\\\"`"_ – Ted Lyngmo Jan 27 '23 at 18:19
  • 1
    However, fixing the slashes alone does not solve the problem. – Wiktor Stribiżew Jan 27 '23 at 18:45

1 Answers1

1

You can use

str_replace_all(text, '(?:\\+?\\d{1,2}\\s)?\\(?\\d{3}\\)?[\\s.-]\\d{3}[\\s.-]\\d{3,4}(?!\\d)', 'aaaaaaaaaa')

See the regex demo.

Details:

  • (?:\+?\d{1,2}\s)? - an optional sequence of an optional + and then one or two digits and a whitespace
  • \(? - an optional (
  • \d{3} - three digits
  • \)? - an optional )
  • [\s.-] - a -, . or whitespace
  • \d{3} - three digits
  • [\s.-] - a -, . or whitespace
  • \d{3,4} - three or four digits
  • (?!\d) - no digit alowed right after.

Notes:

  • In a string literal, a backslash is defined with double \ char
  • ^ and $ match start/end of string so in this case, it makes sense to remove the ^ anchor, and replace $ with a right-digit boundary
  • The last \d{3} did not match numbers where the last part contained four digits, so I replaced it with \d{3,4}.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks! but can the last part contains three digits? – Alex Jan 27 '23 at 18:49
  • 1
    @Alex Your pattern implies that the last part can contain three digits. So, yes. – Wiktor Stribiżew Jan 27 '23 at 18:51
  • good catch, thanks alot! I think it is a data entry issue though. much appreciated again! – Alex Jan 27 '23 at 18:54
  • could you please take a look at [this question](https://stackoverflow.com/questions/75250599/removing-dates-in-any-format-form-a-text-column/75252092?noredirect=1#comment132798859_75252092) where I need to replace the dates? thank you again! much appreciated! – Alex Jan 27 '23 at 19:01