0

How can I remove phone numbers from a string if they are in different formats?

For example I have:

text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
    Smart Functionality: Yes - xx TV Streaming Platform
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'

also how can i remove those formats from text

 09414241441 095-41-41-441 (096)4141441 091-123-11-22 094 00 111 222

How can I remove these phone numbers?

(093) 123-34-56 (068) 123 45 67 (095) 123 456 78

I have tried gsub, but it removes all similar numbers.

user
  • 1,341
  • 2
  • 17
  • 28

4 Answers4

3

You can use:

text.gsub(/\([0-9]*\)\s[0-9]*(-|\s)[0-9]*(-|\s)[0-9]*/, '')

this one will remove the phones in the format you specified in your text:

  • (XXX) XXX-XX-XX
  • (XXX) XXX XX XX

and always when you are trying to write regex try to use this Rubular

  • \([0-9]*\) need to capture numbers inside an parentheses(...), but as parentheses is special characters in regex so adding \ before it, [0-9] mean need a number and as its not only 1 number inside so adding * mean 0 or more number should be inside,

  • \s need a space after it,

  • (-|\s) need dash(-) (OR |) space(\s)

for other formats like:

  • XXXXXXXXXX
  • XXX-XX-XX-XXX
  • (XXX)XXXXXXX

beside above one, with the folliwng:

text.gsub(/\(*[0-9]+(\)|-)+\s*[0-9]+(-|\s)*[0-9]+(-|\s)*[0-9]+|[0-9]{10}/, '')
mohamed-ibrahim
  • 10,837
  • 4
  • 39
  • 51
  • regex is very usefull but a bit complicated to understand – user Apr 27 '16 at 16:14
  • also how can i remove those formats from text `09414241441 095-41-41-441 (096)4141441` – user Apr 27 '16 at 16:23
  • writing new notes in post now, just a min – mohamed-ibrahim Apr 27 '16 at 17:08
  • "how can i remove those formats from text"? You put the necessary requirements into your question, not into comments. The question is ill-defined as is, and adding more conditions/cases in comments only makes the question harder to understand. – the Tin Man Apr 27 '16 at 18:49
  • @mohamed-ibrahim, please do _not_ put code, especially additional/changed solution code, in a comment. It's unreadable and makes finding/understanding the solution very difficult. Edit your answer and add the addition. You can use `---` to add a horizontal bar, or incorporate the information as if you'd originally included that information. Don't use "Edit" or "Update" tags in the text though; They're discouraged and will be removed. – the Tin Man Apr 27 '16 at 18:52
1

As per your format, following regex works

/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/

Ruby Code

print text.gsub(/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/, "")

Ideone Demo

rock321987
  • 10,942
  • 1
  • 30
  • 43
0

If your text is fixed format, that the numbers will always be the first line in the block, then simply remove the first line:

text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
    Smart Functionality: Yes - xx TV Streaming Platform
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'

text.strip
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
text.strip.lines[1..-1].join
# => "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

Or:

lines = text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
lines.shift
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n"
lines.join
# => "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

Using a regex and gsub can work, but it's also more likely to become a maintenance problem.

If the phone numbers will always be on one line, but not necessarily the first, then I'd still use lines to break the text into an array, but I'd use reject with a regex to match the number pattern to check each line and reject the one with the phone-number-like regex match:

lines = text.lines
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }
# => ["\n", "    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", "    Smart Functionality: Yes - xx TV Streaming Platform\n", "    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]

lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }.join
# => "\n    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n    Smart Functionality: Yes - xx TV Streaming Platform\n    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"

Note that not using strip results in the leading "\n" being retained.

Using lines to transform the text to an array helps isolate any damage in case something else triggers the pattern match causing inadvertent damage to the text.

Where this approach breaks down is when the phone numbers are scattered throughout the text. I'd still probably use this approach to reduce the text to individual lines though, again to reduce the possible damage if there are false-positives.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
0
phone_formats = [/(\d{3}) \d{3}-\d{4}/,
                 /\d{3}-\d{3}-\d{4}/,
                 /\d{3} \d{3} \d{4}/,
                 /\(\d{3}\) \d{3} \d{3} \d{2}/,
                 /\(\d{3}\) \d{3} \d{2} \d{2}/,
                 /\(\d{3}\) \d{3}-\d{2}-\d{2}/,
                 /\d{3}-\d{3}-\d{2}-\d{2}/,
                 /\d{3}-\d{3}-\d{2}-\d{2}/]

r = Regexp.union(phone_formats)
  #=> /(?-mix:(\d{3}) \d{3}-\d{4})|
  #    (?-mix:\d{3}-\d{3}-\d{4})|
  #    (?-mix:\d{3} \d{3} \d{4})|
  #    (?-mix:\(\d{3}\) \d{3} \d{3} \d{2})|
  #    (?-mix:\(\d{3}\) \d{3} \d{2} \d{2})|
  #    (?-mix:\(\d{3}\) \d{3}-\d{2}-\d{2})|
  #    (?-mix:\d{3}-\d{3}-\d{2}-\d{2})|
  #    (?-mix:\d{3}-\d{3}-\d{2}-\d{2})/ 

(I have broken the Regexp.union's return value after each | for improved readability.)

text =<<_
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18,
TV with stand (inches) : 28.98x18.68x7.78
_

puts text.gsub(r,'')

Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18,
TV with stand (inches) : 28.98x18.68x7.78
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100