8

I am new to JavaScript and am pretty sure I am missing something fundamental in using JSfrom a HTML page (to be browsed by a web browser).

My goal is to scrap photo links from a dynamic website using cheerio and display them in a js gadget (e.g., using lightslider), it looks quite successful following this tutorial to obtain the following script and run it by simply nodejs scrapt.js in a bash terminal:

var request = require('request');
var cheerio = require('cheerio');
request('https://outbox.eait.uq.edu.au/uqczhan2/Photos/', function (error, respo
  if (!error && response.statusCode == 200) {
    console.log(html);
  }
});

But now I am not able to run this script in a general webbrowser (by pressing f12 -> console), as error shows after the first syntax:

>var request = require('request');
VM85:1 Uncaught ReferenceError: require is not defined
    at <anonymous>:1:15

I understood some JavaScript modules is required to be loaded before using them, for example for d3.js. i need to run:

<script src="https://d3js.org/d3.v4.min.js"></script>

to use all the d3 function. how should I achieve the same thing that would allow me to use cheerio in a web browser?

xlm
  • 6,854
  • 14
  • 53
  • 55
Chenming Zhang
  • 2,446
  • 2
  • 26
  • 34
  • 3
    Cheerio is just a nodejs implementation of jQuery. Just use jQuery when you are scripting for web browser. – Patrick Evans Mar 05 '17 at 14:00
  • 1
    You can't do that in a browser because of same origin policy. But yes, if you could jQuery would be the thing to use. – pguardiario Jun 11 '18 at 14:02
  • You don't even need jQuery. Vanilla JS has tools for making requests and working with the DOM, so you can simply `fetch` your HTML page, create a root element, parse the HTML with `innerHTML` and `document.querySelector` away. – ggorlen Aug 01 '20 at 18:35

4 Answers4

2

You cannot run node.js code directly in the browser. Look into browserify, this is a module that allows you to run node.js code in the browser.

Lorenz Meyer
  • 19,166
  • 22
  • 75
  • 121
0

Cheerio uses a library that requires process, i.e. the Node process object, not available in the browser.

browserify works, however.

Source: Endless headaches trying to get cheerio to work with Webpack.

0

This is an xy problem. You may assume that to parse HTML in the browser, you should use Cheerio, a Node.js HTML parser. The problem is, you can't run Node.js code in the browser without a build tool like browserify to mock require and make it possible.

However, before embarking on adding a build process, it's worth taking a step back and realizing that the browser already has a native HTML parser that requires no packages, plus jQuery, which is an easy <script> tag include away and requires no build process or workarounds. In fact, Cheerio was invented purely to port jQuery syntax to an environment that doesn't have a DOM, Node.js.

So instead of essentially porting jQuery to Node, then back to the browser in a Rube Goldbergian manner, just use jQuery or the native DOM directly. These are the original native browser tools that preceded Cheerio.

request isn't necessary in the browser, either. It's another Node package not intended for browser environments. As above, you can use jQuery or a native fetch call to make your HTTP request.

Taking another step back, though: most servers set a CORS policy to prohibit browser clients on different origins from making cross-origin HTTP requests to their resources. You may need a server running Node and Express to circumvent this restriction. In that case, Cheerio may come in handy again so you can pull the relevant data from your response from the third-party site on the backend and prepare it as a response to your frontend.

Without writing and hosting your own server, you may be able to use a proxy like cors-anywhere to access resources cross-origin.

See also Client on Node.js: Uncaught ReferenceError: require is not defined.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
-2

the short answer is the same way you included d3 js libraries.

require() is defined in requiredjs and to use require function to load your request cheerio you need to import requirejs first the same way you imported d3. requirejs site

Nodejs is server side javascript and you need to be very careful when trying to run them in browser in client side. like creating rest end points is server side which cannot be done in the browser.

As the above answer suggest you can use a build system as wll like webpack, etc or a loader like systemjs to load script.

Aniruddha Das
  • 20,520
  • 23
  • 96
  • 132