I have a doubt related to a regular expression in php.
$content = "START FIRST AAA SECOND AAA"
$content_first = preg_replace('/START(.*)AAA/', 'REPLACED_STRING', $content);
//$content_first == "REPLACED_STRING"
$content_second = preg_replace('/START(.*?)AAA/', 'REPLACED_STRING', $content);
//$content_second == "REPLACED_STRING SECOND AAA"
Why exactly does $content_second differ from $content_first? What is the purpose of '?' in a regex? I have the following regex (really extensive) and I want to modify it so it replaces ALL urls in a string instead of stopping by the first one, but I'm not being able to (it just finds the first URL in the string):
$url_pattern = '/# Rev:20100913_0900 github.com\/jmrware\/LinkifyURL
# Match http & ftp URL that is not already linkified.
# Alternative 1: URL delimited by (parentheses).
(\() # $1 "(" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $2: URL.
(\)) # $3: ")" end delimiter.
| # Alternative 2: URL delimited by [square brackets].
(\[) # $4: "[" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $5: URL.
(\]) # $6: "]" end delimiter.
| # Alternative 3: URL delimited by {curly braces}.
(\{) # $7: "{" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $8: URL.
(\}) # $9: "}" end delimiter.
| # Alternative 4: URL delimited by <angle brackets>.
(<|&(?:lt|\#60|\#x3c);) # $10: "<" start delimiter (or HTML entity).
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $11: URL.
(>|&(?:gt|\#62|\#x3e);) # $12: ">" end delimiter (or HTML entity).
| # Alternative 5: URL not delimited by (), [], {} or <>.
( # $13: Prefix proving URL not already linked.
(?: ^ # Can be a beginning of line or string, or
| [^=\s\'"\]] # a non-"=", non-quote, non-"]", followed by
) \s*[\'"]? # optional whitespace and optional quote;
| [^=\s]\s+ # or... a non-equals sign followed by whitespace.
) # End $13. Non-prelinkified-proof prefix.
( \b # $14: Other non-delimited URL.
(?:ht|f)tps?:\/\/ # Required literal http, https, ftp or ftps prefix.
[a-z0-9\-._~!$\'()*+,;=:\/?#[\]@%]+ # All URI chars except "&" (normal*).
(?: # Either on a "&" or at the end of URI.
(?! # Allow a "&" char only if not start of an...
&(?:gt|\#0*62|\#x0*3e); # HTML ">" entity, or
| &(?:amp|apos|quot|\#0*3[49]|\#x0*2[27]); # a [&\'"] entity if
[.!&\',:?;]? # followed by optional punctuation then
(?:[^a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]|$) # a non-URI char or EOS.
) & # If neg-assertion true, match "&" (special).
[a-z0-9\-._~!$\'()*+,;=:\/?#[\]@%]* # More non-& URI chars (normal*).
)* # Unroll-the-loop (special normal*)*.
[a-z0-9\-_~$()*+=\/#[\]@%] # Last char can\'t be [.!&\',;:?]
) # End $14. Other non-delimited URL.
/imx';
Can anyone help me out or head me towards the right direction? Thank you so much!
Ok, I think I understood all your explanations (ty for that!), is there any reason that only my first url is to be put between 'a' tags? rest of the code:
$url_replace = '$1$4$7$10$13<a>$2$5$8$11$14</a>$3$6$9$12';
return preg_replace($url_pattern, $url_replace, $text);
if
$text =
http://www.youtube.com/watch?v=Cy8duEIHEig http://www.youtube.com/watch?v=Cy8duEIHEig
Only the first URL appears as an URL. Has this something to do with the *? ?