0

Objective

Download the HTML of a Wiki Page.

Background

I am trying to download the HTML of a Wiki page (http://warframe.wikia.com/wiki/Mods_2.0) to parse for information. To achieve this I am using NodeJS and I am using its HTTP Request methods.

Code

I have a very simple code file which merely accesses the website and tries to print its contents:

"use strict";

var http = require("http");

var options = {
  host: "http://warframe.wikia.com",
  port: 80,
  path: 'wiki/Mods_2.0',
  method: "GET"
};

var req = http.request(options, function(res) {

  console.log("STATUS: " + res.statusCode);
  console.log("HEADERS: " + JSON.stringify(res.headers));
  res.setEncoding('utf8');

  res.on("data", function (chunk) {
    console.log("BODY: " + chunk);
  });
});

req.end();

Problem

The problem is that no matter what I do, nor what I try, I always get the following error output:

Debugger listening on port 15454 events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: getaddrinfo ENOTFOUND http://warframe.wikia.com http://warframe.wikia.com:80
    at errnoException (dns.js:27:10)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:78:26)


Process exited with code: 1

I am fairly sure that I am building the URL incorrectly, but somehow I can't understand how to fix this!

What I tried

My approach is based on the contents this discussion In Node.js / Express, how do I "download" a page and gets its HTML?.

I tried several combinations of the URL path in the options variable, only to get different versions of the same error.

I also read In Node.js / Express, how do I "download" a page and gets its HTML?, however that discussion has a different problem (it focuses on streaming, which is not my objective).

Questions

1 - I am fairly sure this is a simple error but I cannot see it. What am I missing?

Community
  • 1
  • 1
Flame_Phoenix
  • 16,489
  • 37
  • 131
  • 266

2 Answers2

2

Remove the http in the url and add / in the path:

"use strict";

var http = require("http");

var options = {
  host: "warframe.wikia.com",
  port: 80,
  path: '/wiki/Mods_2.0',
  method: "GET"
};

var req = http.request(options, function(res) {

  console.log("STATUS: " + res.statusCode);
  console.log("HEADERS: " + JSON.stringify(res.headers));
  //res.setEncoding('utf8');

  res.on("data", function (chunk) {
    console.log("BODY: " + chunk);
  });
});

req.end();
rpadovani
  • 7,101
  • 2
  • 31
  • 50
1

Just remove the http:// from the host

host: "warframe.wikia.com",

And add a / before the path as root directory

path: '/wiki/Mods_2.0'

Hope it will work, see it in previous question Link

Community
  • 1
  • 1
Ananda G
  • 2,389
  • 23
  • 39