0

As i am new to Node and i was trying to get a websites search result as JSON using node and i have tried http chunk method and express get also, but couldn't find it. The URL : https://www.cyccomputer.pe/buscar?search_query=mouse

ashish
  • 13
  • 5
  • Can you elaborate what are you trying to do and share your code please? –  Apr 05 '19 at 20:59
  • I am trying to make an api that results the product in a JSON – ashish Apr 05 '19 at 21:02
  • 1
    So first step fetch the page..... second step, parse the page, third step, build and return the json. – epascarello Apr 05 '19 at 21:04
  • Ok and your API is going to list products on the website you shared the link? –  Apr 05 '19 at 21:05
  • Remisa Yousefvand No. Api is going to list products from website in JSON – ashish Apr 05 '19 at 21:08
  • Do you own the website or you just want to use it as your data source? –  Apr 05 '19 at 21:22
  • I dont own that website – ashish Apr 05 '19 at 21:23
  • 1
    You need to narrow this down to much smaller more specific issues and treat them all separately. Then be specific where you are having issues within each smaller sub task – charlietfl Apr 05 '19 at 21:31
  • 1
    Use an HTTP client like axios or request to request the search results page, then use cheerio to parse and locate the nodes that contain the information you need. Just google "web scraping with Node JS" and you'll find a ton of examples. – djfdev Apr 05 '19 at 21:36

1 Answers1

2

The url https://www.cyccomputer.pe/buscar?search_query=mouse does not return a json. The owner renders a html page and does not serve json.

You can achieve what you're trying by scraping. You can use packages like request, request-promise, axios etc. to fetch the html like:

const rp = require('request-promise')

rp('https://www.cyccomputer.pe/buscar?search_query=mouse')
  .then(html => console.log(html) // html contains the returned html)

// outputs something like:
<!DOCTYPE HTML>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="es-es"><![endif]-->
<!--[if IE 7]><html class="no-js lt-ie9 lt-ie8 ie7" lang="es-es"><![endif]-->
<!--[if IE 8]><html class="no-js lt-ie9 ie8" lang="es-es"><![endif]-->
<!--[if gt IE 8]> <html class="no-js ie9" lang="es-es"><![endif]-->
<html lang="es-es">
    <head>
...

Then you can use packages like html2json, html-to-json etc. to parse the html to json like:

const html2json = require('html2json').html2json;

rp('https://www.cyccomputer.pe/buscar?search_query=mouse')
  .then((html) => {
    const jsonData = html2json(html);
    console.log(jsonData)
  })

// sample from docs
// html to parse
<div id="1" class="foo">
<h2>sample text with <code>inline tag</code></h2>
<pre id="demo" class="foo bar">foo</pre>
<pre id="output" class="goo">goo</pre>
<input id="execute" type="button" value="execute"/>
</div>

// outputs
{
  node: 'root',
  child: [
    {
      node: 'element',
      tag: 'div',
      attr: { id: '1', class: 'foo' },
      child: [
        {
          node: 'element',
          tag: 'h2',
          child: [
            { node: 'text', text: 'sample text with ' },
            { node: 'element', tag: 'code', child: [{ node: 'text', text: 'inline tag' }] }
          ]
        },
...

Update: (to OP's issue)

You may additionally want to use cheerio package to grab the body of the html and parse it to json like:

const cheerio = require('cheerio');

rp('https://www.cyccomputer.pe/buscar?search_query=mouse')
  .then(html => {
    var data = cheerio.load(html);
    var body = data('body').html();
    var result = html2json(body);
    console.log(result);
  })
  .catch(e => console.log('error', e.message))

Note If you're simply console logging, there's a limit to depth. Check out this SO Question to log the entire object`

1565986223
  • 6,420
  • 2
  • 20
  • 33
  • The code works fine for some websites but for the url i provide it shows "Cannot read property 'child' of undefined" – ashish Apr 06 '19 at 12:25
  • Is there a way to parse only the products result coz there is 200 line of JSON output for an empty search (product not found) – ashish Apr 06 '19 at 15:18
  • first download the html, go through the `DOM` and find which `div` or element holds the product, then you can grab like `var product = data('the-product-div').html()` etc – 1565986223 Apr 06 '19 at 15:22