How can I GET content of a HTTPS webpage?

Question

I want to get the content of a webpage by running javascript code on NodeJs . I want the content to be exactly the same as what I see in the browser.

This is the URL : https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9

I use the following code but I get 405 in response.

var fs = require('fs');
var link = 'https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9';
var request = require('request');
request(link, function (error, response, body) {
    fs.writeFile("realestatedata.html", body, function(err) {
        if(err) {
            console.log('error in saving the file');
            return console.log(err);
        }
        console.log("The file was saved!");
    });
})

The file which is saved is not related to what I can see in the browser.

It seems the request you send is not supported by the server. Have you tried request('https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9').pipe(fs.createWriteStream('realestatedata.html')) ? Note that anyway the page will not render the same way when you only open the html since it also requires many other resources (110 requests are done when display the page). — Nicolas Henneaux, Aug 02 '16 at 07:20
I tried the URL starting with `www` and `realtor.ca` and neither worked. How is it possible to make it work ? I mean how can I run all 110 requests ? — Arian, Aug 02 '16 at 07:27

score 0 · Answer 1 · edited May 23 '17 at 12:32

I think a real answer will be easier to understand since my comment was truncated.

It seems the method of the request you send is not supported by the server (405 Method Not Allowed - The method specified in the Request-Line is not allowed for the resource identified by the Request-URI. The response MUST include an Allow header containing a list of valid methods for the requested resource.). Do you have more information about the HTTP response. Have you tried the following code instead of yours ?

request('https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9').pipe(fs.createWriteStream('realestatedata.html'))

You could also have a look at In Node.js / Express, how do I "download" a page and gets its HTML?.

Note that anyway the page will not render the same way when you only open the html since it also requires many other resources (110 requests are done when display the page). I think the following answer can help you to download the whole page. https://stackoverflow.com/a/34935427/1630604

I understand that it doesn't show as it looks in the browser, and I just want to have the HTML content (to be able to crawl it). I used what was suggested in the first link, and it doesn't work. It brings up a page from the same website, but it says the page you are looking for does not exist. The same goes with what you suggested. — Arian, Aug 02 '16 at 17:13

How can I GET content of a HTTPS webpage?

1 Answers1