0

I have a sentence something like below

test_str = r'Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him'

I would like to replace all digits in the above sentence with '0' and phone number should only have the first digit which is +1.

result = r'Mr.X has 00 apples and 00 oranges, his business partner from Colorado staying staying in hotel with phone number +1******** and his wife and kids are staying away from him'

I have the following regex to replace the phone number pattern (which always has a consistent number of digits).

result = re.sub(r'(.*)?(+1)(\d{8})', r'\1\2********', test_str)

Could i replace other digits with 0 except phone number in one single regex?

blackfury
  • 675
  • 3
  • 11
  • 22
  • I believe you need two replace strings for this `(?<!\+)\d+`. First replace with `00` and then with `*****`. See [here](https://regex101.com/r/6MyqhI/2/) –  Jun 03 '20 at 04:10
  • this doesnt work if there is another number inside the string with some other symbols (like 1-22) – blackfury Jun 03 '20 at 04:15
  • So you want them to be `00` or `***`? –  Jun 03 '20 at 04:16
  • all digits except phone number should be replaced with '0' – blackfury Jun 03 '20 at 04:18
  • How do you identify phone numbers in the string? –  Jun 03 '20 at 04:39
  • If it is valid phone number; replace the [phone number](https://stackoverflow.com/questions/16699007/regular-expression-to-match-standard-10-digit-phone-number) first with `*****` and after that replace the numbers with `0`. Something like [**this**](https://regex101.com/r/6MyqhI/3) –  Jun 03 '20 at 04:44

2 Answers2

1

we could use re.sub with function

for replacing the phone number, could use regex below. all digits follow by +1 will be replace to the equivalant number of *

result = re.sub(r'(?<!\w)(\+1)(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)

for replacing other number to 0, can use regex below, all digits not precede with + or digit will be replace by equivalant number of 0

result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), test_str)

example

>>> test_str = r'Mr.X has 23 apples and 59 oranges, his phone number +188991234'
>>> result = re.sub(r'(?<!\w)(\+1)(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
>>> result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), result)
>>> result
'Mr.X has 00 apples and 00 oranges, his phone number +1********'

addon for the follow up question in comment to retain 3 digits of number, we could just modify the 1st regex for the +1 portion, while 2nd regex remains the same

>>> test_str = r'Mr.X has 23 apples and 59 oranges, his phone number +188991234'
>>> result = re.sub(r'(?<!\w)(\+\d{3})(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
>>> result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), result)
>>> result
'Mr.X has 00 apples and 00 oranges, his phone number +188******'
Skycc
  • 3,496
  • 1
  • 12
  • 18
  • This works perfectly fine. But if i need to retain the first three numbers of the phone number, independent of whether it is +1 or if the country code is +000, then the regex i mentioned in the question would still work. But your answer would mask all numbers retained in phone number as well – blackfury Jun 03 '20 at 05:19
  • to retain more phone number digits, u can modify the 1st regex become `r'(?<!\w)(\+\d{3})(\d+)'`, u can freely modify the 3 to retain different number of digits – Skycc Jun 03 '20 at 05:23
  • the first regex wont be a problem. But if i use the second regex, it would replace the digits with 0 in the retained numbers in phone number as well if the phone number doesn't have +. Say for example: if it is just a 8 digit number (which is actually a phone number: 12345678). With first regex, i could mask this to 1234****. But with second regex, it would replace the digits in phone number also (like 0000****) – blackfury Jun 03 '20 at 05:26
  • To be more precise, the second regex should be independant of the symbol +. But should look for * – blackfury Jun 03 '20 at 05:30
  • the second regex should be independent of the symbol + . If + is not there in phone number, this wont work – blackfury Jun 03 '20 at 05:40
  • The phone number string needs to be distinguishable from the other numeric-only strings. The leading '+', your original requirement, is one way. If there is no plus, then there must be another detectable indicator of type "phone number" for the phone number, whether it be length, position in string, or other. Otherwise, all numeric strings are the same to the regex. All cats are black at night. – Danilushka Jun 03 '20 at 12:30
0

If you want to keep the first 3 numbers of the phone number and keep the optional +1 using a single pattern:

(?<!\S)((?:\+1)?)(\d{3})(\d{5})(?!\S)|\d+

In parts

(?<!     Negative lookbehind
  \S     Match any char except a whitespace char
)        Close group
(        Capture group 1
  (?:    Non capture group
    \+   Match + char
    1    Match 1 char
  )?     Close group and repeat 0 or 1 times
)        Close group
(        Capture group 2
  \d{3}  Match a digit and repeat Match 3 times.
)        Close group
(        Capture group 3
  \d{5}  Match a digit and repeat Match 5 times.
)        Close group
(?!      Negative lookahead
  \S     Match any char except a whitespace char
)        Close group
|        Or
\d+      Match a digit and repeat 1 or more times

Regex demo | Python demo

Example code

import re

pattern = r"(?<!\S)((?:\+1)?)(\d{3})(\d{5})(?!\S)|\d+"

s = ("Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him\n\n"
            "This is a tel 12345678 and this is 1234567 123456789")

result = re.sub(
    pattern,
    lambda x: x.group(1) + x.group(2) + "*" * len(x.group(3)) if x.group(2) else "0" * len(x.group()),
    s)
print(result)

Output

Mr.X has 00 apples and 00 oranges, his business partner from Colorado staying staying in hotel with phone number +1889***** and his wife and kids are staying away from him

This is a tel 123***** and this is 0000000 000000000
The fourth bird
  • 154,723
  • 16
  • 55
  • 70