0

I'm trying to code something which tracks the Ontario Immigrant Nominee Program Updates page for updates and then sends an email alert if there's a new article. I've done this in PHP but I wanted to try and recreate it in JS because I've been learning JS for the last few weeks.

The OINP has a public API, but the entire body of the webpage is stored in the JSON response (you can see this here: https://api.ontario.ca/api/drupal/page%2F2020-ontario-immigrant-nominee-program-updates?fields=body)

Looking through the safe_value - the common trend is that the Date / Title is always between <h3> tags. What I did with PHP was create a function that stored the text between <h3> into a variable called Date / Title. Then - to store the article body text I just grabbed all the text between </h3> and </p><h3> (basically everything after the title, until the beginning of the next title), stored it in a 'bodytext' variable and then iterated through all occurrences.

I'm stumped figuring out how to do this in JS.

So far - trying to keep it simple, I literally have:

const fetch = require("node-fetch");

fetch(
  "https://api.ontario.ca/api/drupal/page%2F2020-ontario-immigrant-nominee-program-updates?fields=body"
)
  .then((result) => {
    return result.json();
  })
  .then((data) => {
    let websiteData = data.body.und[0].safe_value;
    console.log(websiteData);
  });

This outputs all of the body. Can anyone point me in the direction of a library / some tips that can help me :

  1. Read through the entire safe_value response and break down each article (Date / Title + Article body) into an array.

I'm probably then just going to upload each article into a MongoDB and then I'll have it checked twice daily -> if there's a new article I'll send an email notif.

Any advice is appreciated!!

Thanks,

Eb Heravi
  • 398
  • 5
  • 15
Yanfly
  • 33
  • 6
  • maybe it's better to do it in two steps. 1) html to JSON : https://www.npmjs.com/package/html-to-json OR https://www.npmjs.com/package/himalaya 2) JSON to JSArray (a normal parse) – malarres Aug 27 '20 at 07:23

3 Answers3

2

You can use regex to get the content of Tags e.g.

/<h3>(.*?)<\/h3>/g.exec(data.body.und[0].safe_value)[1]

returns August 26, 2020

  • nice first answer (upvoted) , but please bear in mind that https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – malarres Aug 27 '20 at 07:25
0

With the use of some regex you can get this done pretty easily.

I wasn't sure about what the "date / title / content" parts were but it shows how to parse some html.

I also changed the code to "async / await". This is more of a personal preference. The code should work the same with "then / catch".

(async () => {
  try {
    // Make request
    const response = await fetch("https://api.ontario.ca/api/drupal/page%2F2020-ontario-immigrant-nominee-program-updates?fields=body");
    // Parse response into json
    const data = await response.json();
    
    // Get the parsed data we need
    const websiteData = data.body.und[0].safe_value;
    
    // Split the html into seperate articles (every <h2> is the start of an new article)
    const articles = websiteData.split(/(?=<h2)/g);
    
    // Get the data for each article
    const articleInfo = articles.map((article) => {
      // Everything between the first h3 is the date
      const date = /<h3>(.*)<\/h3>/m.exec(article)[0];
      // Everything between the first h4 is the title
      const title = /<h4>(.*)<\/h4>/m.exec(article)[0];
      // Everything between the first <p> and last </p> is the content of the article
      const content = /<p>(.*)<\/p>/m.exec(article)[0];
      
      return {date, title, content};
    });
    
    // Show results
    console.log(articleInfo);
  } catch(error) {
    // Show error if there are any
    console.log(error);
  }
})();

Without comments

(async () => {
  try {
    const response = await fetch("https://api.ontario.ca/api/drupal/page%2F2020-ontario-immigrant-nominee-program-updates?fields=body");
    const data = await response.json();
    
    const websiteData = data.body.und[0].safe_value;
    const articles = websiteData.split(/(?=<h2)/g);

    const articleInfo = articles.map((article) => {
      const date = /<h3>(.*)<\/h3>/m.exec(article)[0];
      const title = /<h4>(.*)<\/h4>/m.exec(article)[0];
      const content = /<p>(.*)<\/p>/m.exec(article)[0];
      
      return {date, title, content};
    });
    
    console.log(articleInfo);
  } catch(error) {
    console.log(error);
  }
})();
Reyno
  • 6,119
  • 18
  • 27
0

I just completed creating .Net Core worker service for this.

The value you are looking for is "metatags.description.og:updated_time.#attached.drupal_add_html_head..#value"

The idea is if the last updated changes you send an email notification!

Try this in you javascript

  fetch(`https://api.ontario.ca/api/drupal/page%2F2021-ontario-immigrant-nominee-program-updates`)
    .then((result) => {
      return result.json();
    })
    .then((data) => {
      let lastUpdated = data.metatags["og:updated_time"]["#attached"].drupal_add_html_head[0][0]["#value"];
      console.log(lastUpdated);
    });

I will be happy to add you to the email list for the app I just created!