Download html page with different name

Question

I need help to download the webpages from internet using php script.but right now i have script which is downloading webpage from internet. But it is downloading the webpages with always same name like index.html name.

i want to download the webpage with its own name in url .like aboutus page download with aboutus.html

<!doctype html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Document</title>
    </head>
    <body>
    <form method="post">
        <input name="url" size="50" placeholder="URL" />
        <input name="submit" type="submit" />
    </form>
    </body>
    </html>
    <?php
    // maximum execution time in seconds
    set_time_limit (24 * 60 * 60);

    if (isset($_POST['submit'])) {

        $url = parse_url($_POST['url']);
        $folder = $url['host'];
        if (array_key_exists('path', $url)) {
            $file = explode('.', str_replace('/', '', $url['path']));
            $file .= '.html';
        } else {
            $file = 'index.html';
        }
        if (!sizeOf(glob($folder))) {
            mkdir($folder);
        }
        file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
    }
    ?>

Is all this in a single script? Or the HTML part is in one file and the PHP part in other file? — Ed de Almeida, Feb 29 '16 at 06:21
Break in two parts. Put PHP code in a file, say urldownloader.php, and in your HTLP part add change
to
. Then test again. — Ed de Almeida, Feb 29 '16 at 06:28
Before I forget... change if (isset($_POST['submit'])) to if (isset($_POST['url'])), please. You don't need to check if the submit button was defined, but if the url was really sent by the form POST. — Ed de Almeida, Feb 29 '16 at 06:31

Adolfo Garza · Answer 1 · 2016-02-29T17:47:52.400

Try this:

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<form method="post">
    <input name="url" size="50" placeholder="URL" />
    <input name="submit" type="submit" />
</form>
</body>
</html>
<?php
// maximum execution time in seconds
set_time_limit (24 * 60 * 60);

function get_title($url){       
  $str = file_get_contents($url);

  if(strlen($str)>0){
    libxml_use_internal_errors(true);
    $dom = new DOMDocument;
    $dom->loadHTML($str);
    $title = $dom->getElementsByTagName( "title" );
    $titleText = 'index';
    if($title && $title->length){
        $titleText = $title[0]->textContent;
    }

    libxml_use_internal_errors(false);
    return  $titleText;
  }
}

if (isset($_POST['submit'])) {

    $url = parse_url($_POST['url']);
    $folder = $url['host'];
    if (array_key_exists('path', $url)) {
        $file = get_title($_POST['url']);
        $file .= '.html';
    } else {
        $file = 'index.html';
    }
    if (!sizeOf(glob($folder))) {
        mkdir($folder);
    }
    file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
}
?>

@kamaljot Although it is working, it is better not to use it — Peyman Mohamadpour, Feb 29 '16 at 07:54
Good catch Trix. I edited my response to use DOMDocument to parse the html and avoid using an external library. If you decide to use PHP Simple HTML DOM Parser just download the php file and put it next to this php file you are working with and add this line at the top of your php: require 'simple_html_dom.php'; — Adolfo Garza, Feb 29 '16 at 17:30

score 1 · Answer 2 · edited May 23 '17 at 11:53

Note

Needs PHP Simple HTML DOM Parser

According to

and on the contrary with answer provided by Adolfo Garza, using regex is not a good idea for HTML, use the DOM Parser instead

<?php
function get_title( $url ){
    $html = new simple_html_dom();
    $html->load_file( $url );
    $title = $html->find( 'title' );
    return $title->plaintext;
}
if( isset( $_POST['submit'] ) ){
    $url = parse_url( $_POST['url'] );
    $folder = $url['host'];
    if( array_key_exists( 'path', $url ) ){
        $file = get_title( $_POST['url'] );
        $file .= '.html';
    }else{
        $file = 'index.html';
    }
    if( !sizeOf( glob( $folder ) ) ){
        mkdir( $folder );
    }
    file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
}?>
<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<form method="post">
    <input name="url" size="50" placeholder="URL" />
    <input name="submit" type="submit" />
</form>
</body>
</html>

Download html page with different name

2 Answers2