I'm trying to make a personal project which will gather text and numbers data from wikipedia with a scraper which would later move all that to the database and then compare all these gathered values to make a visual representation.
But I'm having some difficulties with selecting the values I want to gather, specifically for HTML tables. I'd like to select all rows with data inside and only with specific columns.
For example I have a table like this:
Column1 column2 column3
rowdata1 rowdata1 rowdata1
rowdata2 rowdata2 rowdata2
And I want to make it look like this:
Column1 column3
rowdata1 rowdata1
rowdata2 rowdata2
Without a second column and its rows for example. So, is there any simple and straightforward solution on how to do so? Because manually picking names and numbers with Xpath is going to take ages. Here is an example of my current code below
const puppeteer = require('puppeteer');
async function scrapewiki(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const [el] = await page.$x('/html/body/div[3]/div[3]/div[5]/div[1]/table/tbody/tr[2]/td[1]/a');
const txt = await el.getProperty('textContent');
const country = await txt.jsonValue();
const [el2] = await page.$x('/html/body/div[3]/div[3]/div[5]/div[1]/table/tbody/tr[2]/td[3]');
const txt2 = await el2.getProperty('textContent');
const population = await txt2.jsonValue();