0

I created a program in php using CURL, in which i can take data of any site and can display it in the browser. Another part of the program is that the data can be saved in the file using file handling and after saving this data, I can find all the http links within the body tag of the saved file. My code is showing all the sites in the browser which I took, but I can not find the http links and some unnecessary code is also occurring like this image, though I don't want it to come.

https://www.screencast.com/t/Nwaz93oU

PHP Code:

<!DOCTYPE html>
<html>
    <?php
        function get_all_links(){
            $html = file_get_contents('http://www.ucertify.com');
            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xpath = new DOMXPath($dom);
            $hrefs = $xpath->evaluate("/html/body//a");
            for ($i = 0; $i < $hrefs->length; $i++) {
                $href = $hrefs->item($i);
                $url = $href->getAttribute('href');
                echo $url.'<br />';
            }
        }
        function get_site_data($uc_url){
            $get_uc = curl_init();
            curl_setopt($get_uc,CURLOPT_URL,$uc_url);
            curl_setopt($get_uc,CURLOPT_RETURNTRANSFER,true);
            $output=curl_exec($get_uc);
            curl_close($get_uc);
            $fp=fopen("mohit.txt","w");
            fputs($fp,$output);
            return $output;
        }
    ?>
    <body>
        <div>
            <?php
            $site_content = get_site_data("http://www.ucertify.com");
            echo $site_content;
            ?>
            </div>
            <div >
            <?php
            echo get_all_links("http://www.ucertify.com");
            ?>
        </div>
    </body>
</html>
  • What do you mean "I can find all the http links within the body tag", "but I can not find the http links"? in your file, there are http links. Seems like you need to use a filter. Probably with preg_match(). – Bernhard Sep 25 '17 at 11:57
  • I am not able to find all the https, though i'm able to find some. –  Sep 25 '17 at 12:05

1 Answers1

0

On get_all_links method validate if $url variable is a valid url in some pages may have onclick handler to javascript. In order to validate if a url you can use regex and php's preg_match. Also you can look on What is a good regular expression to match a URL? about the needed regex in order to validate a url.

Dimitrios Desyllas
  • 9,082
  • 15
  • 74
  • 164