I’m trying to get the content of a HTML page with a Node.js app. I found this code: In Node.js / Express, how do I "download" a page and gets its HTML? (yojimbo answer), which seems to work well. When I try to start the code, I get the HTML result of 301 Moved Permanently, nut the redirect link is the same as the one I sent!
var util = require("util"),
http = require("http");
var options = {
host: "www.mylink.com",
port: 80,
path: "/folder/content.xml"
};
var content = "";
var req = http.request(options, function(res) {
res.setEncoding("utf8");
res.on("data", function (chunk) {
content += chunk;
});
res.on("end", function () {
util.log(content);
});
});
req.end();
And the return is:
30 Jul 13:08:52 - <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<p>The document has moved <a href="http://mylink.com/folder/content.xml"<here</a>.</p>
<hr>
<adress>Apache/2.2.22 (Ubuntu) Server at www.mylink.com Port 80</adress>
</body></html>
Is it moved permanently to the same place or is it just some kind of security on the server? Or did I made a mistake in the code? (but it work on google and all the other site I tested).
I doubt it s the ".xml" which cause a problem since I even tested with page in pdf without problem (just a bunch of non readable chars).
Following a discussion with the client, I’ll get the page in another way (downloading it directly), which works OK. I still accept the answer of c.Pu.1, but I’m still wondering why the redirect link is the same as the link the app follow.