I would like to scrape the following data table from a website:
<body style="background-color:grey;">
<div class="table" id="myTable" style="display: table;">
<div class="tr" style="background-color: #4CAF50; color: white;">
<div class="td tnic">Nickname</div>
<div class="td tsrv">Server IP</div>
<div class="td tip">IP</div>
<div class="td treg">Region</div>
<div class="td tcou">Country</div>
<div class="td tcit">City</div>
<div class="td tscr">Score <input type="checkbox" onchange="mysrt(this)" id="chkscr"></div>
<div class="td tupd">Update Time <input type="checkbox" onchange="mysrt(this)" id="chkupd" checked="" disabled="">
</div>
<div class="td taut">Auth Key</div>
<div class="td town">Key Owner</div>
<div class="td tver">Version</div>
<div class="td tdet">Details</div>
</div>
<div class="tr mytarget ">
<div class="td tnic">Player 1</div>
<div class="td tsrv">_GAME_MENU_</div>
<div class="td tip">x.x.226.35</div>
<div class="td treg">North America</div>
<div class="td tcou">United States</div>
<div class="td tcit">Cleveland</div>
<div class="td tscr">21</div>
<div class="td tupd">2022-12-29 10:17:01 (GMT-8)</div>
<div class="td taut">SecretauthK3y</div>
<div class="td town">CoolName</div>
<div class="td tver">7.11</div>
<div class="td tdet">FPS: 93 @ 0(0) ms @ 0 K/m</div>
</div>
<div class="tr mytarget ">
<div class="td tnic">PlayerB</div>
<div class="td tsrv">_GAME_MENU_</div>
<div class="td tip">x.x.90.221</div>
<div class="td treg">North America</div>
<div class="td tcou">United States</div>
<div class="td tcit">Mechanicsville</div>
<div class="td tscr">67991</div>
<div class="td tupd">2022-12-29 10:16:56 (GMT-8)</div>
<div class="td taut">SecretauthK3y2</div>
<div class="td town">PlayerB</div>
<div class="td tver">7.12</div>
<div class="td tdet">FPS: 50 @ 175(243) ms @ 0 K/m</div>
</div>
<div class="tr mytarget ">
<div class="td tnic">McChicken</div>
<div class="td tsrv">_GAME_MENU_</div>
<div class="td tip">x.x.39.80</div>
<div class="td treg">North America</div>
<div class="td tcou">United States</div>
<div class="td tcit"></div>
<div class="td tscr">0</div>
<div class="td tupd">2022-12-29 09:41:44 (GMT-8)</div>
<div class="td taut">SecretauthK3y3</div>
<div class="td town">SOLO KEY</div>
<div class="td tver">7.12</div>
<div class="td tdet">FPS: 63 @ 0(0) ms @ 0 K/m</div>
</div>
</div>
It has a header row under .tr
and then each row of data is represented by the div with .tr mytarget
. Normally there are hundreds of more .tr_mytarget
rows which all have an identical format to the three shown. My goal is to scrape this data in such a way that will make it easy to then perform some calculations and filtering to it. It will eventually be re-used in a new data table.
I have a small amount of experience with JS so my idea was to use puppeteer. My question is twofold: In what format should I scrape the data so that it's in an appropriate format to use and how do I write the Puppeteer statements to do this?
This is what I have so far:
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('redactedurl.com');
await page.waitForSelector('#myTable');
const nicks = await page.$$eval(
'.table .tr_mytarget .td_tnic',
allNicks => allNicks.map(td_tnick => td_tnick.textContent)
);
await console.log(nicks);
I dont fully understand how to write the $$eval statement. I'm thinking I will want one array for the header and one for the data but I'm not sure. What's recommended?