2

I’m trying to get the content of a HTML page with a Node.js app. I found this code: In Node.js / Express, how do I "download" a page and gets its HTML? (yojimbo answer), which seems to work well. When I try to start the code, I get the HTML result of 301 Moved Permanently, nut the redirect link is the same as the one I sent!

var util = require("util"),
    http = require("http");

var options = {
    host: "www.mylink.com",
    port: 80,
    path: "/folder/content.xml"
};

var content = "";   

var req = http.request(options, function(res) {
    res.setEncoding("utf8");
    res.on("data", function (chunk) {
        content += chunk;
    });

    res.on("end", function () {
        util.log(content);
    });
});

req.end();

And the return is:

30 Jul 13:08:52 - <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<p>The document has moved <a href="http://mylink.com/folder/content.xml"<here</a>.</p>
<hr>
<adress>Apache/2.2.22 (Ubuntu) Server at www.mylink.com Port 80</adress>
</body></html>

Is it moved permanently to the same place or is it just some kind of security on the server? Or did I made a mistake in the code? (but it work on google and all the other site I tested).

I doubt it s the ".xml" which cause a problem since I even tested with page in pdf without problem (just a bunch of non readable chars).

Following a discussion with the client, I’ll get the page in another way (downloading it directly), which works OK. I still accept the answer of c.Pu.1, but I’m still wondering why the redirect link is the same as the link the app follow.

Community
  • 1
  • 1
DrakaSAN
  • 7,673
  • 7
  • 52
  • 94
  • Notice that you request `/folder/content.xml` and redirect to `/folder.content.xml`. It's not the same url. – Nefreo Jul 30 '13 at 12:15
  • Sorry, it s a typo, I had to recopy the message from another computer – DrakaSAN Jul 30 '13 at 12:49
  • Not the same URL: request should be "www.mylink.com/folder/content.xml", but there's no "www" in the response. I had the same issue, but in my case, the server was requiring a trailing "/" on the URL. – Roger Dueck May 04 '20 at 22:14

2 Answers2

1

A 301 status code indicates the requested resource has been moved and that a client must perform a redirect to the link included in the response's Location header. The http module doesn't follow redirects(status codes 3xx) by default.

You can use the request module, which is said to perform redirects.

Request is designed to be the simplest way possible to make http calls. It supports HTTPS and follows redirects by default.

To do it manually, read the Location header from the response and initiate a new request to that URI.

var req = http.request(options, function(res) {
    res.setEncoding("utf8");
    if(res.statusCode === 301 || res.statusCode === 302) {
        var newRequestUri = res.headers.location;
        http.request({hostname: newRequestUri}, function(res) {
            //read response
        }
    }
    res.on("data", function (chunk) {
        content += chunk;
    });

    res.on("end", function () {
        util.log(content);
    });
});
c.P.u1
  • 16,664
  • 6
  • 46
  • 41
  • 1
    I could do that, but the things that intrigate me is that I get the exact same link in the redirect page. – DrakaSAN Jul 30 '13 at 12:51
0

If the redirect link in the "Location:" header is the same as the originally requested link, then the server is either misconfigured or broken.

Note that the link in the response body is only there as a convenience to humans and should not be considered authoritative. Only the "Location:" field in the HTTP Response header should be used to locate a redirected resource.

Rob Raisch
  • 17,040
  • 4
  • 48
  • 58