0

I had a regex that replaced all URLs from a given string:

my_string = "www.example.com test www.mysite.com"
my_string.gsub!(/[a-zA-Z0-9\-\.]+\.(com|net|de|org|uk|biz|info|co.uk|es|de)(\/\S*)?/i,'(site hidden)')

As a result of the above I get: "(site hidden) test (site hidden)"

How could I change the regex to not replace www.mysite.com ???

It means that the replace should output "(site hidden) test www.mysite.com"

Thanks !

ratamaster
  • 169
  • 1
  • 1
  • 12

2 Answers2

3

How about bruteforce? :)

my_string = "www.example.com test www.mysite.com"

regex = /[a-zA-Z0-9\-\.]+\.(com|net|de|org|uk|biz|info|co.uk|es|de)(\/\S*)?/i

uniq  = rand(2**1024).to_s

p my_string.gsub('mysite.com', uniq).gsub(regex, '(site hidden)').gsub(uniq, 'mysite.com')

See live demo here

  • There should be a clever name for this pattern. I've had to do it before, often using something awful like `\0` to represent it. – tadman Nov 20 '12 at 05:05
1

You could use a block to generate the replacement, using the original text if it's in the list of allowed entries:

my_string = "www.example.com test www.mysite.com"
allowed = %w(www.mysite.com)
re = %r/[a-zA-Z0-9\-\.]+\.(com|net|de|org|uk|biz|info|co.uk|es|de)(\/\S*)?/i
my_string.gsub!(re) do
  |m| allowed.include?(m) ? m : '(site hidden)'
end
puts my_string
qqx
  • 18,947
  • 4
  • 64
  • 68
  • 1
    `gsub` plus a block are powerful allies. This would be even better if `allowed` was a hash to avoid geometric increase of substitution times as the list grows. `hash = Hash[array.collect{|v|[v,true]}]` is a quick way to convert it. – tadman Nov 20 '12 at 05:07
  • @tadman Using a Hash would indeed be a good idea if there's to be a large list of fixed strings to allow. But I suspect a real implementation would need to have one or more regexps to allow large sets of URLs. – qqx Nov 20 '12 at 05:33