How do I print out the values in the "flex" portion of the html in a website?

Question

At the moment I am trying to print out just the information that is held within the flex portion of this table of data, but I am not sure why it is not being read. Everytime I print it out it returns as:

<div id="app">
 <div class="flex loader">
  loading
 </div>
</div>

Although I am not sure why it is not reading the data within the "loading" section.

This is my code:

import requests
from bs4 import BeautifulSoup
import selenium

URL = "https://thestatepress.github.io/SalaryList/dist/index.html?embed=true"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

results = soup.find(id="app")
print(results.prettify())

It looks like you're trying to read data from a page that dynamically loads content using javascript. The `requests` module just fetches the raw HTML source; it doesn't have a javascript engine. You need to use something like `selenium` to drive a real browser that will correctly handle the dynamic content. You're importing `selenium` in your code, but you're not using it. Take a look at the examples here on Stack Overflow (or elsewhere online) to see how to get started using Selenium. — larsks, May 25 '22 at 00:55

rzlvmp · Answer 1 · 2022-05-25T01:16:06.250

Your data is generated with javascript using XHR. You have to use Selenium to be able run javascript and get generated page.

In other way it is possible to get data directly without accessing entire page:

import requests
import json

urls = [
  "https://thestatepress.github.io/SalaryList/data-json/ASU-2018.json",
  "https://thestatepress.github.io/SalaryList/data-json/ASU-2021.json"
]

response_data = []

for URL in urls:
  page = requests.get(URL)
  if page.status_code == 200:
    response_data += json.loads(page.text)

# print example piece of data:
print(response_data[10:11])

And output will be:

[{'jobDescription': 'Program Manager', 'departmentDescription': '...', 'salary': 11111, 'key': '...', 'firstName': '...', 'lastName': '...'}]

Or you may use requests_html (docs) python library that support javascript execution (however looks like it is using Selenium-like libraries under the hood so direct usage of Selenium may be more flexible).

Thank you. Is there a way to upload the data to an excel file from there? The output seems to be printing as a list, but the list is written as a dictionary inside of a list, so I am not sure how to extract it. — Nick, May 25 '22 at 23:34
Yes, it is [possible](https://stackoverflow.com/questions/3086973/how-do-i-convert-this-list-of-dictionaries-to-a-csv-file) — rzlvmp, May 26 '22 at 00:13

How do I print out the values in the "flex" portion of the html in a website?

1 Answers1