I am trying to scrape some website using Cheerio, however since the app is dynamic the content is not present in the HTML but on a JS object that I am not sure how to access (I have tried window, document etc.)
My code:
let axios = require('axios') // HTTP client
let cheerio = require('cheerio') // HTML parsing package
const url = 'https://www.foo.com'
const getWebsiteContent = async (url) => {
try {
const response = await axios.get(url)
const $ = cheerio.load(response.data)
console.log(response.data)
} catch (error) {
console.error(error)
}
}
getWebsiteContent(url)
The result of the console.log (I am just pasting the part of it that I neeed to access):
<!DOCTYPE html>
<html lang='en' ng-app='Test'>
<head>
</head>
<body class='' data-allow-utf8='false'>
<h1>HEADER</h1>
<script>
var matchData = function () {
Live.load.main({
version: "1.2",
sports: [
{
title: 'matchone',
subtitle: 'foo'
},
{
title: 'matchtwo',
subtitle: 'aaa'
}
],
})
}
</script>
<!-- More stuff -->
</body>
</html>
The data I want to access is the sports
array, contained in that Live.load.main
method inside matchData
function.
I am not even sure if Cheerio is the correct tool since I was expecting the data to be in a piece of HTML but apparently is loaded in some way that I can only see it in a JS object when firing the GET request.