1

I'm new to JS and am trying to execute a function on a site to pull all the data in table in JSON format.

I am using Parse Cloud Code to send my http requests, and the requests themselves are working, but I can't seem to get just the data itself.

It seems I am only able to get it in HTML and even then the objects do not display the same way that they do in the webpage's elements.

Any help/advice would be greatly appreciated!

Thank you in advance.

This is the link:

http://www.capetown.gov.za/Media-and-news#k=thinkwater

Here is the code:

Parse.Cloud.define('hello', function(req, res) {
  res.success('Hi');
});

Parse.Cloud.define('htmlTest', function(req, res) {
  Parse.Cloud.httpRequest({
    method: 'POST',
    url: 'http://www.capetown.gov.za/Media-and-news#k=thinkwater',
    params: {
      action: '/Media-and-news',
      id: 'aspnetForm',
      onsubmit: 'javascript:return WebForm_OnSubmit();'
    },
    headers: {
      'Content-Type': 'application/json;charset=utf-8'
    }
  }).then(function(httpResponse) {
    // success
    res.success(httpResponse.text);
  }, function(httpResponse) {
    // error
    res.error('Request failed with response code ' + httpResponse.status);
  });
});
laggingreflex
  • 32,948
  • 35
  • 141
  • 196
BLE
  • 13
  • 5
  • why are you using Parse Cloud? use fetch it's really nice and simple https://davidwalsh.name/fetch – Pixelomo Mar 16 '18 at 05:18
  • Possible duplicate of [How can I scrape pages with dynamic content using node.js?](https://stackoverflow.com/questions/28739098/how-can-i-scrape-pages-with-dynamic-content-using-node-js) – laggingreflex Mar 16 '18 at 06:58

1 Answers1

2

You can't execute client-side JavaScript function with an HTTP request.

Here's what happens when you load that page:

  1. Server (the site you're trying to fetch) receives an HTTP request (from you)

  2. Server generates initial HTML and responds to whoever made the above request, be it a browser, or your NodeJS Code. This "initial" HTML is what you get with a simple HTTP request. (which in your case doesn't contain the results you need)

  3. If the HTML was served inside a browser, additional client-side JavaScript code is executed (i.e. the "javascript function" which you're trying to want to execute). This can only happen in a browser (or browser-like) environment. This JavaScript code (or function) modifies HTML (when loaded in a browser-environment, using DOM) and thus the final HTML is rendered. You can't get to these results with a simple HTTP request*, as that will only get you till #2.

*You can find out which URL the client javascript uses to fetch those results itself. Network tab in console tools might help with this. When you click on the button that triggers it to fetch the results keep an eye on which requests are made.

.

In your case it seems to be fetching JSON with a POST request from http://www.capetown.gov.za/_vti_bin/client.svc/ProcessQuery although it doesn't seem straightforward, it makes a series of requests each depending on the previous one, at least that's what it seems on the first glance. Feel free to explore this route yourself.

So in order to get the final HTML you will either

  1. Need the direct URL that serves those results. This is usually the quickest but requires understanding the site's API and how it fetches results if it does so via AJAX (fetching via client-side JavaScript)

  2. Use a fetcher with a browser or browser-like environment. E.g. PhantomJS (deprecated), Puppeteer, selenium, zombie

laggingreflex
  • 32,948
  • 35
  • 141
  • 196
  • Hey thank you so much for that brilliant response! I want to scrape it without a browser, so call the function from : http://www.capetown.gov.za/_vti_bin/client.svc/ProcessQuery and then retrieve the json response. I have found a payload for the XHR request, but I'm still fairly novice and don't know what to do with it. – BLE Mar 16 '18 at 16:55
  • hey sorry to bump this question but could you give me anymore advice? – BLE Mar 19 '18 at 23:42