0

Is there a way to find all urls inside a string and save each trunk of the original message in an array?

My goal is to intercepts url, change it with a function that change some parameters in the url, and rebuild original string.

Example:

$original_string = "Hi, this is a list of urls: http://www.google.it, www.amazon.it, https://www.amzn.to/XXXXX and at the end we have www.example.it";

Expected result:

$result = array(
0 => "Hi, this is a list of urls: ",
1 => "http://www.google.it",
2 => ", ",
3 => "www.amazon.it",
4 => ", ",
5 => "https://www.amzn.to/XXXXX",
6 => " and at the end we have ",
7 => "www.example.it"
);

After this result, i can edit my link with a function i've already done and rebuild the string.

I can find all urls inside a string with: preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $original_string, $urls);

but i lost all other text...

UPDATE: tried this code as suggested, but i get strange result:

$x = preg_split('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $original_string, -1, PREG_SPLIT_DELIM_CAPTURE);

var_dump($x);



  array(9) {
  [0]=>
  string(28) "Hi, this is a list of urls: "
  [1]=>
  string(1) "t"
  [2]=>
  string(2) ", "
  [3]=>
  string(1) "t"
  [4]=>
  string(2) ", "
  [5]=>
  string(1) "X"
  [6]=>
  string(24) " and at the end we have "
  [7]=>
  string(1) "t"
  [8]=>
  string(0) ""
}
itajackass
  • 346
  • 1
  • 2
  • 15

1 Answers1

1

Your best bet is regular expressions. According to your original problem description, very likely you'll need to use preg_replace_callback function, rather than splitting the string into array, processing and re-assembling it.

I can't say it's a reliable source to use, but start from PHP: Regular Expression to get a URL from a string if you need help creating a regular expression. Or just use a web search :)

This online tool can be useful to understand regexps better - https://regex101.com/

Here is an example with regular expression taken from Extract URLs from text in PHP

$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

var_export(preg_replace_callback("#$pattern#i", function($matches) {
    $url = $matches[0];
    // put your code here. Or call your existing function/method with the $url parameter
    return '->' . $url . '<-';
  }, $original_string) ));
astax
  • 1,769
  • 1
  • 14
  • 23
  • Hi, yes i have already a function like this: preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $msg, $urls); but it lost all my other text... – itajackass Mar 15 '20 at 22:17
  • @itajackass yes, the function you used **finds** all links, but doesn't help changing them. Although it's possible to use it with the help of some flags (see PREG_OFFSET_CAPTURE), it's pretty advanced and unnecessary complex. Check https://www.php.net/manual/en/function.preg-replace-callback.php - I think it's a perfect fit for your problem – astax Mar 15 '20 at 22:21
  • actually the changing is not my problem (i've already done the function). My first goal is to get expected result array in the first post – itajackass Mar 15 '20 at 22:23
  • I fail to see why you don't want to use the function designed to do what you need. But if you insist - use preg_split with a flag PREG_SPLIT_DELIM_CAPTURE. This will ensure you get both the links (they will be delimiters for this function) and the text between them. However, you'll likely need to do some post-processing of the result of this function. – astax Mar 15 '20 at 22:29
  • You can use preg_match_all & preg__split, in that case from 1st one you can get the urls and from 2nd you'll get the array spitted by urls – Razin Abid Mar 15 '20 at 22:29
  • @astax tried preg_split, but i don't get what i'd like to see. first post updated with result (wrong) – itajackass Mar 15 '20 at 22:51
  • 1
    @itajackass check the documentation for this function - it has some good examples in the comments. Your example has number of problems - 1) wrong number of parameters (option is not 3rd, but 4th); 2) Your regular expression doesn't find links without https:// ; 3) No parentheses around the regular expression. I'm sure you now have enough information to solve the problem yourself. Giving you just a working piece of code is not really in spirit of stackoverflow. – astax Mar 15 '20 at 23:08
  • 1
    I've updated the answer with the example code using better regexp and preg_replace_callback function – astax Mar 15 '20 at 23:24
  • @astax wow! for me regexp is a pain:(!! thank you for the help!!!! – itajackass Mar 15 '20 at 23:37