0

I am trying to create a regex which will ultimately be used with Google Forms to validate a texarea input.

The rule is,

  • Input area can have one or more URLs (http or https)
  • Each URL must be separated either by one or more new lines
  • Each line which has text, must be a single valid URL
  • Last URL may have or may not have new line character/s after it

Till now, I have written this regex ^(https?://.+[\r\n]+)*(https?://.+[\r\n]+?)$ but the problem is that if a line has more than 1 url, it validates that too.

Here is my testing playground: http://goo.gl/YPdvBH.

Chandranshu
  • 3,669
  • 3
  • 20
  • 37
Waqar Ahmad
  • 3,716
  • 1
  • 16
  • 27

3 Answers3

1

Here is what you are looking for

Demo , Demo with your URLS

    function validate(ele) {
        str = ele.value;
        str = str.replace(/\r/g, "");
        while (/\s\n/.test(str)) {
            str = str.replace(/\s\n/g, "\n");
        }
        while (/\n\n/.test(str)) {
            str = str.replace(/\n\n/g, "\n");
        }
        ele.value = str;

        str = str.replace(/\n/g, "_!_&_!_").split("_!_&_!_")

        var result = [], counter = 0;

        for (var i = 0; i < str.length; i++) {
            str[i] = str[i].replace(/(?:(?:^|\n)\s+|\s+(?:$|\n))/g, '').replace(/\s+/g, ' ');
            if(str[i].length !== 0){
            if (isValidAddress(str[i])) {
                result.push(str[i]);
            }
            counter += 1;
            }
        }

        function isValidAddress(s) {
            return /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$/i.test(s)
        }
        return (result.length === str.length);
    }
    var ele = document.getElementById('urls');
    validate(ele);
redV
  • 684
  • 1
  • 9
  • 26
  • I definitely do not need a JavaScript solution. I just need a regex which will validate the text in Google Form TextArea. Anyways, thanks for your time. – Waqar Ahmad Nov 22 '13 at 06:10
  • I could not down vote Question as my score is not enough to get permission. But will do that. If you do not need JS solution may I know why you have mentioned JS tag in your question. – redV Nov 22 '13 at 06:24
  • Because he needed a Javascript regex. Please know that not all languages are created equal. Some languages have very powerful regexes which may not necessarily work in other languages. – Chandranshu Nov 22 '13 at 06:28
  • Because I needed a Javascript regex and not a javascript solution, There are few differences when you use regex in unix/perl/.net or in some other language. Just to show you one example, Difference in special and non-printable charcaters used in regex http://www.regular-expressions.info/refcharacters.html . There are many other differences. – Waqar Ahmad Nov 22 '13 at 06:36
1

This is closer to the regex you are looking for:

^(https?://[\S]+[\r\n]+)*(https?://[\S]+[\r\n]+?)$

The difference between your regex and this one is that you use .+ which will match all characters except newline whereas I use [\S]+ (note it is a capital S) which will match all non-whitespace characters. So, this doesn't match more than one token on one line. Hence, on each line you can match at max one token and that must be of the form that you have defined.

For a regex to match a single URL, look at this question on StackOverflow:

I don't know whether google-forms have a length limit. But if they have, it is sure to almost bounce into it.

Community
  • 1
  • 1
Chandranshu
  • 3,669
  • 3
  • 20
  • 37
  • That is better now, But what if the URL are concatenated without any space/comma etc. This will fail to validate. Anyways, that gave me the Idea to refine my regex more. Thanks and love your edits about grammer – Waqar Ahmad Nov 22 '13 at 06:54
0

If i understand right - in your regexp missing m flag for multiline, so you need something like this

/^(https?://.+this your reg exp for one url)$/m

sample with regexp from Javascript URL validation regex

/^(ht|f)tps?:\/\/[a-z0-9-\.]+\.[a-z]{2,4}\/?([^\s<>\#%"\,\{\}\\|\\\^\[\]`]+)?$/m
Community
  • 1
  • 1
Grundy
  • 13,356
  • 3
  • 35
  • 55
  • I am not sure if you understood my question. In Google Forms, I can not specify a multiline(m) flag. I can just input the pattern. The whole text in text area should match the pattern as one. e.g, the regex which I quoted in my question can match URLs with one or more line as a whole. Just one problem, if the line has more than one URL, it matches the whole text too. – Waqar Ahmad Nov 22 '13 at 06:16
  • @WaqarAhmad Yep, you're right I do not understand correctly :-) – Grundy Nov 22 '13 at 06:35