PHP/Regex: Check the format of an input

Question

I have an input-field into which an user can write some links, after the submit I want to check this input for the correct structure.

The allowed structure:

Google: http://google.com
YouTube: http://youtube.com
Stackoverflow: http://stackoverflow.com/

My Regex doesn't work as I imagined it.

(.*)\:(\s?)(.*)\n

The Regex shall be used in a preg_match-function.

Edit (moved from a comment):

My Code:

$input = 'Google: http://google.com
YouTube: http://youtube.com
wrong
Stackoverflow: http://stackoverflow.com/';
if (preg_match_all('/(.*?)\:\s?(.*?)$/m', $input))
{
    echo 'ok';
}
else
{
    echo 'no';
}

I get 'ok'. But because of the 'wrong' which is not the right pattern I expect a 'no'.

Only thing I see off is that you are making `\n` required. Really you should do `$` with the `m` modifier. And you want to make your first `(.*)` non greedy or it will match up to the `:` in the url. — Jonathan Kuhn, Dec 18 '15 at 22:04
Oh, and use `preg_match_all` instead of `preg_match` or else you will match the first one and nothing else. https://regex101.com/r/oQ1dL8/2 — Jonathan Kuhn, Dec 18 '15 at 22:06
Precise URL matching is complex: http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url — trincot, Dec 18 '15 at 22:07
Now, my Code looks like this but it doesn't work `$input = 'Google: http://google.com YouTube: http://youtube.com wrong Stackoverflow: http://stackoverflow.com/'; if (preg_match_all('/^(.*?):(\s?)(.*)$/m', $input)) { echo 'ok'; } else { echo 'no'; }` PHP say 'ok' but it must be 'no', because of 'wrong' is the structure wrong — Sven Eberth, Dec 18 '15 at 22:31
second `.*` must also be made lazy... add `?`, see my answer. — trincot, Dec 18 '15 at 22:39

score 2 · Accepted Answer · edited May 23 '17 at 11:52

There are a few things to correct:

The asterisk operator is greedy. In your case you want it to be lazy, so add a question mark after it in both instances;
You probably are not interested in retaining the separating space in the middle, so don't put brackets around it;
if you want all lines to be processed, you need to use preg_match_all instead of preg_match;
unless you are certain that your last line ends with a new line, you need to test for the end of the string with the dollar sign;
as that last test will need brackets, use ?: to make it non-capturing as you are not interested in retaining that new line character;
some systems have \r before every \n, so you should add that, otherwise it gets into one of your capture groups. Alternatively, use the m modifier in combination with $ (end-of-line) and forget about newlines;
as the colon also appears in a URL, you should at least test for that one, otherwise the absence of the first one (after the site name) will make the "http" become part of the site name.

This leads to the following:

$input =
"Google: http://google.com
YouTube: http://youtube.com
Stackoverflow: https://stackoverflow.com/";

$result = preg_match("/(.*?)\:\s?(\w?)\:(.*?)$/m", $input, $matches);
echo $result ? "matched!"
print_r ($matches);

Outputs:

Array
(
    [0] => Array
        (
            [0] => Google: http://google.com
            [1] => YouTube: http://youtube.com
            [2] => Stackoverflow: https://stackoverflow.com/
        )

    [1] => Array
        (
            [0] => Google
            [1] => YouTube
            [2] => Stackoverflow
        )

    [2] => Array
        (
            [0] => http://google.com
            [1] => http://youtube.com
            [2] => https://stackoverflow.com/
        )
)

The first element has the complete matches (the lines), the second element the matches of the first capturing group, and the last element the contents of the second capturing group.

Note that the above does not validate URLs. That is a subject on its own. Have look at this

EDIT

If you are interested in deciding on whether the whole input is correctly formatted or not, then you can use the above expression, but then with preg_replace. You replace all the good lines by blanks, trim the end-result from newlines, and test whether anything is left over:

$result =  trim(preg_replace("/(.*?)\:\s?(\w*?):(.*?)$/m", "", $input));
if ($result == "") {
    echo "It matches the pattern";
} else {
    echo "It does not match the pattern. Offending lines:
         " . $result;
}

The above would allow empty lines to occur in your input.

I think you understood me wrong. I want only to check if the structure has been complied. In [this comment](http://stackoverflow.com/questions/34364561/php-regex-check-the-format-of-an-input/34364913#comment56472326_34364561) I said more to my problem. — Sven Eberth, Dec 18 '15 at 22:42
Your 'Edit' solved my Problem. Sorry for all obscurities. Thanks! — Sven Eberth, Dec 19 '15 at 12:44

Jan · Answer 2 · 2015-12-18T22:27:35.587

0

Your question is somewhat vague. To match a url, you could simply do sth. like:

^[^:]+:\s*https?:\/\/[^\s]+$
# match everything except a colon, then followed by a colon
# followed by whitespaces or not
# match http/https, a colon, two forward slashes literally
# afterwards, match everything except a whitespace one or unlimited times
# anchor it to start(^) and end($) (as wanted in the comment)

See a working demo here.

edited Dec 18 '15 at 22:27

answered Dec 18 '15 at 22:06

Jan

42,290
8
54
79

I don't want to get the URL or something else of the string. I want to check if the structure has been complied. – Sven Eberth Dec 18 '15 at 22:11
@Xübecks: You need to assure anchor points then, see my updated answer. – Jan Dec 18 '15 at 22:16

score 0 · Answer 3 · answered Dec 18 '15 at 23:06

You can achieve that by finding a line that does not meet your requirement.

Use '~(.*?):\s?(.*)$~m' with a !preg_match. See this demo printing "no":

$input = 'Google: http://google.com
YouTube: http://youtube.com
wrong
Stackoverflow: http://stackoverflow.com/';
if (!preg_match('~(.*?):\s?(.*)$~m', $input)) {
    echo 'ok';
}
else {
    echo 'no';
}

Note that you do not need to escape : symbol. Also, I suggest switching to greedy dot matching at the end, since this will force the engine to grab all the line till the end at once, and then checking the end of line there, so the regex will be more efficient. You could also try replacing the first .*? with [^:]* for efficiency sake.

PHP/Regex: Check the format of an input

Edit (moved from a comment):

3 Answers3

EDIT