There are a few things to correct:
- The asterisk operator is greedy. In your case you want it to be lazy, so add a question mark after it in both instances;
- You probably are not interested in retaining the separating space in the middle, so don't put brackets around it;
- if you want all lines to be processed, you need to use preg_match_all instead of preg_match;
- unless you are certain that your last line ends with a new line, you need to test for the end of the string with the dollar sign;
- as that last test will need brackets, use
?:
to make it non-capturing as you are not interested in retaining that new line character;
- some systems have
\r
before every \n
, so you should add that, otherwise it gets into one of your capture groups. Alternatively, use the m
modifier in combination with $ (end-of-line) and forget about newlines;
- as the colon also appears in a URL, you should at least test for that one, otherwise the absence of the first one (after the site name) will make the "http" become part of the site name.
This leads to the following:
$input =
"Google: http://google.com
YouTube: http://youtube.com
Stackoverflow: https://stackoverflow.com/";
$result = preg_match("/(.*?)\:\s?(\w?)\:(.*?)$/m", $input, $matches);
echo $result ? "matched!"
print_r ($matches);
Outputs:
Array
(
[0] => Array
(
[0] => Google: http://google.com
[1] => YouTube: http://youtube.com
[2] => Stackoverflow: https://stackoverflow.com/
)
[1] => Array
(
[0] => Google
[1] => YouTube
[2] => Stackoverflow
)
[2] => Array
(
[0] => http://google.com
[1] => http://youtube.com
[2] => https://stackoverflow.com/
)
)
The first element has the complete matches (the lines), the second element the matches of the first capturing group, and the last element the contents of the second capturing group.
Note that the above does not validate URLs. That is a subject on its own. Have look at this
EDIT
If you are interested in deciding on whether the whole input is correctly formatted or not, then you can use the above expression, but then with preg_replace
. You replace all the good lines by blanks, trim the end-result from newlines, and test whether anything is left over:
$result = trim(preg_replace("/(.*?)\:\s?(\w*?):(.*?)$/m", "", $input));
if ($result == "") {
echo "It matches the pattern";
} else {
echo "It does not match the pattern. Offending lines:
" . $result;
}
The above would allow empty lines to occur in your input.