0

This is my first phone app. I am using Ionic for the cross-platform work which uses Angular as you know I'm sure. I have a separate program which scrapes a webpage using puppeteer and cheerio and creates an array of values from the web page. This works.

I'm not sure how I get the array in my web scraping program read by my ionic/angular program.

I have a basic ionic setup and am just trying a most basic activity of being able to see the array from the ionic/angular side but after trying to put it in several places I realized I really didnt know where to import the code to ionic/angular which returns the array or where to put the webscraper code directly in one of the .ts files or ???

This is my web scraping program:

const puppeteer = require('puppeteer'); // live webscraping

let scrape = async () => {
  const browser = await puppeteer.launch({
    headless: true
  });
  const page = await browser.newPage();

  await page.goto('--page url here --'); // link to page 

  const result = await page.evaluate(() => {
    let data = []; // Create an empty array that will store our data
    let elements = document.querySelectorAll('.list-myinfo-block'); // Select all Products
    let photo_elements = document.getElementsByTagName('img'); //

    var photo_count = 0;

    for (var element of elements) { // Loop through each product getting photos
      let picture_link = photo_elements[photo_count].src;
      let name = element.childNodes[1].innerText;
      let itype = element.childNodes[9].innerText
      data.push({
        picture_link,
        name,
        itype
      }); // Push an object with the data onto our array
      photo_count = photo_count + 1;
    }
    return data;
  });

  browser.close();
  return result; // Return the data
};

scrape().then((value) => {
  console.log(value); // Success!
});

When I run the webscraping program I see the array with the correct values in it. Its getting it into the ionic part of it. Sometimes the ionic phone page will show up with nothing in it, sometimes it says it cannot find "/" ... I've tried so many different places and looked all over the web that I have quite a combination of errors. I know I'm putting it in the wrong places - or maybe not everywhere I should. Thank you!

Md. Abu Taher
  • 17,395
  • 5
  • 49
  • 73

1 Answers1

0

You need a server which will run the scraper on demand.

Any scraper that uses a real browser (ie: chromium) will have to run in a OS that supports it. There is no other way.

Think about this,

  • Does your mobile support chromium and nodeJS? It does not. There are no chromium build for mobile which supports automation with nodeJS (yet).
  • Can you run a browser inside another browser? You cannot.

Way 1: Remote wsEndpoint

There are some services which offers wsEndpoint but I will not mention them here. I will describe how you can create your own wsEndPoint and use it.

Run browser and Get wsEndpoint

The following code will launch a puppeteer instance whenever you connect to it. You have to run it inside a server.

const http = require('http');
const httpProxy = require('http-proxy');

const proxy = new httpProxy.createProxyServer();

http
  .createServer()
  .on('upgrade', async(req, socket, head) => {
      const browser = await puppeteer.launch();
      const target = browser.wsEndpoint();

      proxyy.ws(req, socket, head, { target })
  })
  .listen(8080);

When you run this on the server/terminal, you can use the ip of the server to connect. In my case it's ws://127.0.0.1:8080.

Use puppeteer-web

Now you will need to install puppeteer-web on your mobile/web app. To bundle Puppeteer using Browserify follow the instruction below.

Clone Puppeteer repository:

git clone https://github.com/GoogleChrome/puppeteer && cd puppeteer
npm install
npm run bundle

This will create ./utils/browser/puppeteer-web.js file that contains Puppeteer bundle.

You can use it later on in your web page to drive another browser instance through its WS Endpoint:

<script src='./puppeteer-web.js'></script>
<script>
  const puppeteer = require('puppeteer');
  const browser = await puppeteer.connect({
    browserWSEndpoint: '<another-browser-ws-endpont>'
  });
  // ... drive automation ...
</script>

Way 2: Use an API

I will use express for a minimal setup. Consider your scrape function is exported to a file called scrape.js and you have the following index.js file.

const express = require('express')
const scrape= require('./scrape')
const app = express()

app.get('/', function (req, res) {
  scrape().then(data=>res.send({data}))
})

app.listen(8080)

This will launch a express API on the port 8080.

Now if you run it with node index.js on a server, you can call it from any mobile/web app.

Helpful Resources

I had some fun with puppeteer and webpack,

To keep the api running, you will need to learn a bit about backend and how to keep the server alive etc. See these links for full understanding of creating the server and more,

Md. Abu Taher
  • 17,395
  • 5
  • 49
  • 73