-1

Let's say I want to retrieve total views from a YouTube channel. Then I update the paragraph tag with the total views in HTML file. How do I do this? I know Python language can do this easily, but is it possible with Javascript? Thank you in advance.

Ali
  • 39
  • 4

1 Answers1

-1

This is known as web scraping.

The first step in Web scraping is searching if there's an API that already serves what you need. APIs, or Application Program Interfaces, are domains were you can request data, for example, through http request and parameters.

Since youtube is a popular website, there may be an official api and definetely many unofficial ones that return the views of a youtube video, using the video ID as its parameter (the hash number in the URL).

As for consuming an API through javascript, it depends if you're using client-side javascript (on a browser) or server-side javascript (node.js). You'll have to deal with asynchronous code either way.

If you want to make HTTP requests to consume an API through browser javascript, search about fetch.

If you're on server-side, using node, take a look at https module.

However, if you didn't find any API to consume the website you want to scrape, you'll have to do it on your own. This usually involves:

  1. obtaining the raw HTML
  2. parsing the raw HTML into a navigable tree
  3. building functions to consume your tree and retrieving specific fields that match specific criteria. For example, views have a different styling than other information on pages. That's because they have different CSS properties. To have different CSS properties, it probably have an unique id or some classes that can help you identify the element and select it.

As you said, that's easily done in server-side with python. We have requests or scrapy modules to make get requests and obtain plain HTML. Then, beautifulSoup can parse the HTML into a navigable tree. It also offer functions to manipulate that tree.

For example:

import requests
from bs4 import BeautifulSoup
response = requests.get('https://url.to.your.website')
soup = BeautifulSoup(response.text, 'lxml')
soup.find_all("p", attrs={"class": "shine"})

Code above (in python) requests a page, pass the raw html to beautifulSoup and creates a reference to all <p> paragraphs with a specific class, shine.

<body>
    <p class = "shine">paragrah 1 content</p>
    <p class = "shine">paragrah 2 content</p>
    <p>paragrah 1 content</p>
</body>

For example, if the requested page had that HTML, code snippet would be able to create an array of elements with references to the first two paragraphs that match de specific conditions. Than you would be able to extract content, classes, navigate to children, parent classes, etc.

Be aware that Python with Scrapy + BeautifulSoup is the standard for web scraping when parsing the HTML. You could have a backend server responsible for scraping, and just consume it with your client-side javascript via fetch. (IE, make your own API). If you're doing that with javascript, you're going against the current and you may find a hard time doing specific tasks.

However it's possible.

Again, it depends on what you mean by "javascript". If you're using node, server-side javascript, you can do this with jsdom.

Client-side web scraping is a lot harder. Browsers are built to avoid that. You can read about that discussion here Browser-based client-side scraping.

But there may be a solution, I'm just not aware of that. But I hope to have contributed to your knowledge, with the proper terms ("client-side web scraping javascript") you may find something with deeper search.

nluizsoliveira
  • 355
  • 1
  • 9