Unless you want to use or write a JavaScript parser, which is only fun for a very limited set of individuals, I suggest taking
advantage of the thriving headless Chrome community. Grabbing JS variables with
puppeteer is straightforward after some boilerplate node code.
It's also shockingly (but not "blazingly") fast.
Before running the code:
- Have node js and npm working on your machine
- Install
jq
for parsing JSON in the shell. It is available in most package managers, so brew install jq
or sudo apt install jq
etc should work.
- Install Puppeteer in whichever directory these scripts are going to live in with
npm i puppeteer
A file like this is all you need to get started with Puppeteer. I added comments to the key areas.
#!/usr/bin/env node
const puppeteer = require('puppeteer')
;(async () => {
const browser = await puppeteer.launch()
// Replace the line above with this statement a fun show
// const browser = await puppeteer.launch({
// headless: false,
// devtools: true,
// })
const page = await browser.newPage()
// Arbitrarily choosing SO for the demo, replace with your website
await page.goto('https://stackoverflow.com/')
// Or use an argument:
// const uri = process.argv[2]
// await page.goto(process.argv[0])
const retrievedData = await page.evaluate(() => {
// This block has the page context, which is almost identical to being in the console
// except for some of the console's supplementary APIs.
// Get the URL host name and path separately
const { origin, pathname } = window.location;
// Get the title in a silly way, for demonstration purposes only
const title = document.querySelector('title').textContent
// More interesting - save data from the `StackExchange` object from `window`
const { options: soOptions } = window.StackExchange
// Return an object literal with data for the shell script
return {
origin,
pathname,
soOptions,
}
})
// Convert the object from the browser eval to JSON to parse with with jq later
const retrievedJSON = JSON.stringify(retrievedData, null, 4)
// console.log writes to stdout in node
console.log(retrievedJSON)
await browser.close()
})()
Note the shebang at the top, which makes the shell understand to run it with node
.
If we make this file executable and run it:
chmod +x get-so-data.js
./get-so-data.js
We have a CLI utility that will provide a JSON string of data from the JavaScript global execution context of the website. Here are some small generic shell examples.
#!/bin/sh
# Confirm that jq understands the result (should pretty print with ANSI colors):
./get-so-data.js | jq
# {
# Many data...
# }
# Check if user is logged in (the user is our node script in a sandboxed browser, so no):
./get-so-data.js | jq '.soOptions.user.isRegistered'
# null
# Tell the time, according to StackExchange's server clock (linux only):
./get-so-data.js | jq '.soOptions.serverTime' | date -d $(echo -n '@' && cat --)
# Fri 11 Sep 2020 04:37:02 PM PDT
# Open a subset of the JSON payload returned by Puppeteer in the default editor:
./get-so-data.js | jq '.soOptions' | $EDITOR --
# or VS Code specifically
./get-so-data.js | jq '.soOptions' | code --
# ...
As long as the JavaScript side of the equation is returning enough info to construct a file path, you can open files in your editor based on JavaScript in a browser.
The shell date example takes about 1.5 seconds on a three-year-old Chromebook
from within a Linux (Beta) container
using 25mbps public wifi. Your mileage will vary depending on the performance of
the site you're debugging and the steps in the script.
$ time ./get-so-data.js | jq '.soOptions.serverTime' | date -d $(echo -ne '@' && cat --)
Fri 11 Sep 2020 04:43:24 PM PDT
real 0m1.515s
user 0m0.945s
sys 0m0.383s
$ time ./get-so-data.js | jq '.soOptions.serverTime' | date -d $(echo -ne '@' && cat --)
Fri 11 Sep 2020 04:43:30 PM PDT
real 0m1.755s
user 0m0.999s
sys 0m0.433s
$ time ./get-so-data.js | jq '.soOptions.serverTime' | date -d $(echo -ne '@' && cat --)
Fri 11 Sep 2020 04:43:33 PM PDT
real 0m1.422s
user 0m0.802s
sys 0m0.361s
Resources