0

At the moment I am trying to print out just the information that is held within the flex portion of this table of data, but I am not sure why it is not being read. Everytime I print it out it returns as:

<div id="app">
 <div class="flex loader">
  loading
 </div>
</div>

Although I am not sure why it is not reading the data within the "loading" section.

This is my code:

import requests
from bs4 import BeautifulSoup
import selenium

URL = "https://thestatepress.github.io/SalaryList/dist/index.html?embed=true"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

results = soup.find(id="app")
print(results.prettify())
Daniel Walker
  • 6,380
  • 5
  • 22
  • 45
Nick
  • 1
  • 1
  • 1
    It looks like you're trying to read data from a page that dynamically loads content using javascript. The `requests` module just fetches the raw HTML source; it doesn't have a javascript engine. You need to use something like `selenium` to drive a real browser that will correctly handle the dynamic content. You're importing `selenium` in your code, but you're not using it. Take a look at the examples here on Stack Overflow (or elsewhere online) to see how to get started using Selenium. – larsks May 25 '22 at 00:55

1 Answers1

2

Your data is generated with javascript using XHR. You have to use Selenium to be able run javascript and get generated page.

In other way it is possible to get data directly without accessing entire page:

import requests
import json

urls = [
  "https://thestatepress.github.io/SalaryList/data-json/ASU-2018.json",
  "https://thestatepress.github.io/SalaryList/data-json/ASU-2021.json"
]

response_data = []

for URL in urls:
  page = requests.get(URL)
  if page.status_code == 200:
    response_data += json.loads(page.text)

# print example piece of data:
print(response_data[10:11])

And output will be:

[{'jobDescription': 'Program Manager', 'departmentDescription': '...', 'salary': 11111, 'key': '...', 'firstName': '...', 'lastName': '...'}]

Or you may use requests_html (docs) python library that support javascript execution (however looks like it is using Selenium-like libraries under the hood so direct usage of Selenium may be more flexible).

rzlvmp
  • 7,512
  • 5
  • 16
  • 45
  • Thank you. Is there a way to upload the data to an excel file from there? The output seems to be printing as a list, but the list is written as a dictionary inside of a list, so I am not sure how to extract it. – Nick May 25 '22 at 23:34
  • Yes, it is [possible](https://stackoverflow.com/questions/3086973/how-do-i-convert-this-list-of-dictionaries-to-a-csv-file) – rzlvmp May 26 '22 at 00:13