3

How can I validate text column that it doesn't contain websites, examples can be :

www.google.com
google.com
http://gooogle.com
http://www.google.com
https://www.google.com
https://google.com

I want to do this on the front side but on the back end as well. I'm more interested in back end at the moment, as I will deal with the front end later

Question update:

Based on example provided by MrYoshiji, I've come up with case that is not covered:

http://rubular.com/r/VGgWyfIt7R

See the http://www.google.com in the middle of the text? and it is not matched? That is exactly what I need it to be matched. So I can throw validation error saying you can't put websites.

Gandalf StormCrow
  • 25,788
  • 70
  • 174
  • 263

3 Answers3

3

I found a strong regexp, credits goes to @PhillPafford (PHP RegEx for "Website Name" If you upvote my answer, please upvote his first!):

/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/

To see it in action:

http://rubular.com/r/GOHHrucCdX


UPDATE:

This one will find the names anywhere in the text:

/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/

Note that I removed the ^ at the start and the $ at the end to make it work within a text:

Rubular source:

^ Start of line

$ End of line

http://rubular.com/r/iEVzfv2U3O


@GandalfStormCrow noticed that the following is matched:

Since I was little.My first dog
                #^^^

The only way I see to solve this issue would be to replace little.My with little. My:

text.gsub(/\w\.[A-Z]/) { |matched_string| matched_string.gsub('.', '. ') }

See it in action:

1.9.3p489 :018 > text = "hello my name is robert.My dog"
 => "hello my name is robert.My dog" 
1.9.3p489 :019 > text.gsub(/\w\.[A-Z]/) { |matched_string| matched_string.gsub('.', '. ') }
 => "hello my name is robert. My dog" 
Community
  • 1
  • 1
MrYoshiji
  • 54,334
  • 13
  • 124
  • 117
  • it works well for domain matching, but when I put google.com in the middle of your text. It doesn't match and that is what I'm looking for, please see this http://rubular.com/r/VGgWyfIt7R – Gandalf StormCrow May 09 '14 at 15:16
  • This regex lets " google.com" pass. (backticks dont work with a space in them.) – DickieBoy May 09 '14 at 15:17
  • That's obviously why you call a `.strip` before testing it ...... You don't want users to save any input with 1 or more spaces before, do you? – MrYoshiji May 09 '14 at 15:17
  • Downvote explanation please? @DickieBoy is that you? – MrYoshiji May 09 '14 at 15:22
  • @MrYoshiji nope not me – DickieBoy May 09 '14 at 15:35
  • @MrYoshiji this regex seems to be overly eager. In the sentence `since I was little.My first dog` little.My is recognized as a url, can you fix this please? – Gandalf StormCrow May 09 '14 at 15:49
  • This part, `little.my` is a possible domain name. Only a human eye could make the difference. The only way I see to fix this issue would be to replace all point followed by a upcase changed to the same with a space between them. It's hacky, I know, but I don't see any other way to do it... – MrYoshiji May 09 '14 at 15:53
  • can you make it at least that it be `little.` followed by lowercase. I think that would be enought distinction. `Where little.my` is domain name and `little.My` is not – Gandalf StormCrow May 09 '14 at 15:56
  • 1
    I updated my answer in consequence ;) Hope this one is good enough in your case! – MrYoshiji May 09 '14 at 15:59
0

in your model add:

validates_format_of :your_column, without: /\A((http(s)?:)?\/\/(www)?.)?(www.)?[a-zA-Z]*.com\z/

Here's where I crafted the Regex

Philipp Antar
  • 182
  • 3
  • 11
-1

I see many have taken the regex approach to this but maybe you'd like to create a simple blacklist. Something like:

class MyModel < ActiveRecord::Base
  BLACKLIST = [
    'google.com'
  ]

  validate :disallow_blacklisted_urls

  private
  def disallow_blacklisted_urls
    BLACKLIST.each do |blacklisted_url|
      if my_field && my_field.include?(blacklisted_url)
        errors.add(:my_field, "must not contain #{blacklisted_url}")
      end
    end
  end
end

The reason I'd go this way is that you can easily add more urls (facebook.com, twitter.com) and it will still work and be clear after a year of not seeing the code while the regex is too cryptic for my aging eyes (and brain :-)). Also if you don't want to check the whole blacklist every time, you can add a break into the core condition of the validation method but I think the user will have a better feedback like this.

Renra
  • 5,561
  • 3
  • 15
  • 17