0

I would like to find all URLs in a string (curl results) and then encode any query strings in those results, example

urls found:

http://www.example.com/index.php?favoritecolor=blue&favoritefood=sharwarma

to replace all those URLS found with encoded string (i can only do one of them)

http%3A%2F%2Fwww.example.com%2Findex.php%3Ffavoritecolor%3Dblue%26favoritefood%3Dsharwarma

but do this in a html curl response, find all URLS from html page. Thank you in advanced, i have searched for hours.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    Use `preg_replace_callback()` to call `urlencode` on every URL that you find in the string. – Barmar Feb 03 '14 at 08:35
  • How come we are not seeing your code in your question? – anubhava Feb 03 '14 at 08:38
  • Will you be having more than 1 url in a string.. P.S.-Can you clarify a bit more what you want to do – akki Feb 03 '14 at 08:38
  • thank you, everyone, the php code (DOM) worked well. how can i now lets say find the URLS all of them again (including image src, css url, etc) and then change/replace something there like http://www.example.com/index.php?favoritecolor=blue&favoritefood=sharwarma to http://www.url.com/getpage.php?get=http%3A%2F%2Fwww.example.com%2Findex.php%3Ffavoritecolor%3Dblue%26favoritefood%3Dsharwarma Thank you! – DOMDocumentVideoSource Feb 05 '14 at 07:19

2 Answers2

1

This will do what you want if your CURL result is an HTML page and you only want a links (and not images or other clickable elements).

$xml = new DOMDocument();

// $html should be your CURL result
$xml->loadHTML($html);

// or you can do that directly by providing the requested page's URL to loadHTMLFile
// $xml->loadHTMLFile("http://...");

// this array will contain all links
$links = array();

// loop through all "a" elements
foreach ($xml->getElementsByTagName("a") as $link) {
    // URL-encodes the link's URL and adds it to the previous array
    $links[] = urlencode($link->getAttribute("href"));
}

// now do whatever you want with that array

The $links array will contain all the links found in the page in URL-encoded format.

Edit: if you instead want to replace all links in the page while keeping everything else, it's better to use DOMDocument than regular expressions (related : why you shouldn't use regex to handle HTML), here's an edited version of my code that replaces every link with its URL-encoded equivalent and then saves the page into a variable :

$xml = new DOMDocument();

// $html should be your CURL result
$xml->loadHTML($html);

// loop through all "a" elements
foreach ($xml->getElementsByTagName("a") as $link) {
    // gets original (non URL-encoded link)
    $original = $link->getAttribute("href");

    // sets new link to URL-encoded format
    $link->setAttribute("href", urlencode($original));
}

// save modified page to a variable
$page = $xml->saveHTML();

// now do whatever you want with that modified page, for example you can "echo" it
echo $page;

Code based on this.

Community
  • 1
  • 1
  • don't use htmlDOM for trace html. this quastion says 'how to find urls in text `not html` ' –  Feb 03 '14 at 09:04
  • @MahmoudEskandari `find all URLs in a string (curl results)` and `find all URLS from html page` clearly say that the OP wants to work with an HTML page, and in my opinion this is the cleanest way to do this. –  Feb 03 '14 at 09:06
  • Hi, thank you both if you however when I add the exact php below, providing $xxx is my html result, i get a blank screen... – DOMDocumentVideoSource Feb 03 '14 at 17:15
  • thank you, that seems to work... just had to do yum install php-xml – DOMDocumentVideoSource Feb 03 '14 at 17:29
  • how can i now lets say find the URLS all of them again (including image src, css url, etc) and then change/replace something there like http://www.example.com/index.php?favoritecolor=blue&favoritefood=sharwarma to http://www.url.com/getpage.php?get=http%3A%2F%2Fwww.example.com%2Findex.php%3Ffavoritecolor%3Dblue%26favoritefood%3Dsharwarma – DOMDocumentVideoSource Feb 04 '14 at 08:57
0

Do not use php Dom directly, it will slow down your execution time, use simplehtmldom, its easy

function decodes($data){
foreach($data->find('a') as $hres){
$bbs=$hres->href;
$hres->__set("href", urlencode($bbs));
}
return $data;
}