2

I need help to download the webpages from internet using php script.but right now i have script which is downloading webpage from internet. But it is downloading the webpages with always same name like index.html name.

i want to download the webpage with its own name in url .like aboutus page download with aboutus.html

<!doctype html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Document</title>
    </head>
    <body>
    <form method="post">
        <input name="url" size="50" placeholder="URL" />
        <input name="submit" type="submit" />
    </form>
    </body>
    </html>
    <?php
    // maximum execution time in seconds
    set_time_limit (24 * 60 * 60);

    if (isset($_POST['submit'])) {

        $url = parse_url($_POST['url']);
        $folder = $url['host'];
        if (array_key_exists('path', $url)) {
            $file = explode('.', str_replace('/', '', $url['path']));
            $file .= '.html';
        } else {
            $file = 'index.html';
        }
        if (!sizeOf(glob($folder))) {
            mkdir($folder);
        }
        file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
    }
    ?>
rahul
  • 841
  • 1
  • 8
  • 18
kamal jot
  • 29
  • 4

2 Answers2

2

Try this:

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<form method="post">
    <input name="url" size="50" placeholder="URL" />
    <input name="submit" type="submit" />
</form>
</body>
</html>
<?php
// maximum execution time in seconds
set_time_limit (24 * 60 * 60);

function get_title($url){       
  $str = file_get_contents($url);

  if(strlen($str)>0){
    libxml_use_internal_errors(true);
    $dom = new DOMDocument;
    $dom->loadHTML($str);
    $title = $dom->getElementsByTagName( "title" );
    $titleText = 'index';
    if($title && $title->length){
        $titleText = $title[0]->textContent;
    }

    libxml_use_internal_errors(false);
    return  $titleText;
  }
}

if (isset($_POST['submit'])) {

    $url = parse_url($_POST['url']);
    $folder = $url['host'];
    if (array_key_exists('path', $url)) {
        $file = get_title($_POST['url']);
        $file .= '.html';
    } else {
        $file = 'index.html';
    }
    if (!sizeOf(glob($folder))) {
        mkdir($folder);
    }
    file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
}
?>
Adolfo Garza
  • 2,966
  • 12
  • 15
  • @kamaljot Although it is working, it is better not to use it – Peyman Mohamadpour Feb 29 '16 at 07:54
  • yes it giving me problem to download https: related url – kamal jot Feb 29 '16 at 09:31
  • Good catch Trix. I edited my response to use DOMDocument to parse the html and avoid using an external library. If you decide to use PHP Simple HTML DOM Parser just download the php file and put it next to this php file you are working with and add this line at the top of your php: require 'simple_html_dom.php'; – Adolfo Garza Feb 29 '16 at 17:30
1

Note

Needs PHP Simple HTML DOM Parser

According to

and on the contrary with answer provided by Adolfo Garza, using regex is not a good idea for HTML, use the DOM Parser instead

<?php
function get_title( $url ){
    $html = new simple_html_dom();
    $html->load_file( $url );
    $title = $html->find( 'title' );
    return $title->plaintext;
}
if( isset( $_POST['submit'] ) ){
    $url = parse_url( $_POST['url'] );
    $folder = $url['host'];
    if( array_key_exists( 'path', $url ) ){
        $file = get_title( $_POST['url'] );
        $file .= '.html';
    }else{
        $file = 'index.html';
    }
    if( !sizeOf( glob( $folder ) ) ){
        mkdir( $folder );
    }
    file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
}?>
<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<form method="post">
    <input name="url" size="50" placeholder="URL" />
    <input name="submit" type="submit" />
</form>
</body>
</html>
Community
  • 1
  • 1
Peyman Mohamadpour
  • 17,954
  • 24
  • 89
  • 100