1

I would like to replace some numbers in the text column of my data. The numbers are either 8 or 9 digits and in two formats. This is snapshot of the data:

df <- data.frame(
  notes = c(
    'my number is 123-41-567',
    "321 12 788 is valid",
    'why not taking 987-012-678',
    '120 967 325 is correct'
  )
)

df %>% select(notes)

                       notes
1    my number is 123-41-567
2        321 12 788 is valid
3 why not taking 987-012-678
4     120 967 325 is correct

I need to replace them all with a term such as aaaaa. Hence, the data should look like:

           notes
1     my number is aaaaa
2        aaaaa is valid
3   why not taking aaaaa
4     aaaaa is correct

Thank you in advance!

Phil
  • 7,287
  • 3
  • 36
  • 66
Alex
  • 245
  • 1
  • 7
  • 1
    You'll need a regular expression. It's not so easy to write a regular expression that matches all possible numbers! See: https://stackoverflow.com/questions/16699007/regular-expression-to-match-standard-10-digit-phone-number – Mhairi McNeill Jan 27 '23 at 16:46
  • can you please review [this question](https://stackoverflow.com/questions/75261660/replacing-phone-numbers-in-different-formats-in-r). it is what you suggested and I believe you can figure it out. Many thanks! – Alex Jan 27 '23 at 17:52

1 Answers1

0

Assuming the examples really do cover all possible cases (I would be careful). You can do this with the following regular expression:

\\d{3}( |-)\\d{2,3}( |-)\\d{3}

Here's the code for replacing:

library(dplyr)
library(stringr)

df %>% 
    mutate(
        notes = str_replace_all(notes, '\\d{3}( |-)\\d{2,3}( |-)\\d{3}', 'XXXXXX')
    )

                  notes
1   my number is XXXXXX
2       XXXXXX is valid
3 why not taking XXXXXX
4     XXXXXX is correct
Mhairi McNeill
  • 1,951
  • 11
  • 20
  • Thank you! the code is working well – Alex Jan 27 '23 at 16:55
  • can you please review [this question](https://stackoverflow.com/questions/75261660/replacing-phone-numbers-in-different-formats-in-r) – Alex Jan 27 '23 at 17:51