can't get all span tag inside div element beautifulsoup

Question

I am scraping this site and I need to get the salary value from it as shown in the image

I have tried to do the flowing:

import requests
from bs4 import BeautifulSoup
result = requests.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo- Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")
page = result.content
soup = BeautifulSoup(page, "lxml")
salaries_div = soup.find_all("div",{"class":"css-rcl8e5"})
for span in salaries_div[3].select("span"):
    print (span)

But I am only getting this span

<span class="css-wn0avc">Salary<!-- -->:</span>

My question is why I can't get all the span inside the div? And what should I do to get salary value in this case?

Simply because it's rendered via JS. – αԋɱҽԃ αмєяιcαη Aug 31 '21 at 13:24 — αԋɱҽԃ αмєяιcαη, Aug 31 '21 at 13:24
How can I get them in this case? – Mhd O. Aug 31 '21 at 16:25 — Mhd O., Aug 31 '21 at 16:25

score 0 · Accepted Answer · answered Aug 31 '21 at 17:10

Since Beautiful Soup is just a parser that works with the content you provide it with, it has nothing to do with page retrieval or rendering.

The solution that I found in my case is to use selenium to get JS rendered page.

The working code:

from bs4 import BeautifulSoup
from webdriver_manager import driver
from webdriver_manager.chrome import ChromeDriver, ChromeDriverManager
from selenium import webdriver

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo-Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")

page = driver.page_source
soup = BeautifulSoup(page, "lxml")
salaries_div = soup.find_all("div",{"class":"css-rcl8e5"})
for span in salaries_div[3].select("span"):
    print (span)

Adam Jenča · Answer 2 · 2021-08-31T17:41:51.517

If the content on your page is generated by JavaScript, try Selenium. I think it has all the functionality you need. Your code will then look like this:


### Let's import Selenium!
from selenium.webdriver import Firefox,FirefoxOptions
### At first, we need to say Selenium it should not show graphical window, so we will use Firefox in headless mode.
### We do so by creating instance of FirefoxOptions and setting its attribute 'headless' to True
opt=FirefoxOptions()
opt.headless=True
### Now, we create the actual Firefox instance and we pass it our FirefoxOptions as keyword argument 'options'
ffx=Firefox(options=opt)
### We visit your website with ffx.get()
ffx.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo- Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")
### Let's now search for your spans with ffx.find_elements_by_css_selector()
elems=ffx.find_elements_by_css_selector("div.css-rcl8e5:nth-child(5)>span")
### And print the elements
for elem in elems:
    print(elem.get_attribute('outerHTML'))

This (at least at my case) outputs:

<span class="css-wn0avc">Salary<!-- -->:</span>
<span class="css-47jx3m"><span class="css-4xky9y">Confidential</span></span>

To access the second element, use elems[-1], and elems[-1].get_attribute('outerHTML') to get its html source.

But do not forget to install Selenium with

pip install selenium

And you should have Firefox with geckodriver installed.

Thank you, try to use [webdriver_manager](https://pypi.org/project/webdriver-manager) so you don't need to have a geckodriver — Mhd O., Sep 02 '21 at 06:34

can't get all span tag inside div element beautifulsoup

2 Answers2

Linked