Python Beautiful Soup not pulling all the data

Question

I'm currently looking to pull specific issuer data from URL html with a specific class and ID from the Luxembourg Stock Exchange using Beautiful Soup.

The example link I'm using is here: https://www.bourse.lu/security/XS1338503920/234821 And the data I'm trying to pull is the name under 'Issuer' stored as text; in this case it's 'BNP Paribas Issuance BV'.

I've tried using the class vignette-description-content-text, but it can't seem to find any data, as when looking through the soup, not all of the html is being pulled.

I've found that my current code only pulls some of the html, and I don't know how to expand the data it's pulling.

import requests
from bs4 import BeautifulSoup

URL = "https://www.bourse.lu/security/XS1338503920/234821"

page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find(id='ResultsContainer', class_="vignette-description-content-text")

I have found similar problems and followed guides shown in link 1, link 2 and link 3, but the example html used seems very different to the webpage I'm looking to scrape.

Is there something I'm missing to pull and scrape the data?

I think the issue may be that the data you want is produced by javascript on the web page, and is not in the actual html. You might have more luck using selenium instead of beautifulsoup. But I'm not that familiar with selenium; hopefully others can be of more help. Good luck. — jmaloney13, Apr 12 '21 at 11:17

score 0 · Answer 1 · answered Apr 12 '21 at 11:12

0

Based on your code, I suspect you are trying to get element which has class=vignette-description-content-text and id=ResultsContaine. The class_ is correct way to use ,but not with the id

Try this:

import requests
from bs4 import BeautifulSoup

URL = "https://www.bourse.lu/security/XS1338503920/234821"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

def applyFilter(element):
   if element.has_attr('id') and element.has_attr('class'):
      if "vignette-description-content-text" in element['class'] and element['id'] == "ResultsContainer":
         return True

results = soup.find_all(applyFilter)
for result in results:
   #Each result is an element here

answered Apr 12 '21 at 11:12

Shreyesh Desai

569
4
19

Hi @Shreyesh, thank you for your time on this. I've tried running the code using the filter function you've defined above, however, I'm still getting an empty results array. Is the issue still not with all of the html not being pulled by the Beautifulsoup function? – AlwaysInTheDark Apr 12 '21 at 11:19
1

ITs a javascript page and hence the final rendered page isn't part of the data that requests module returns. You might have to look at getting the rendered final HTML by using selenium, where you use a browser and its driver (eg: [here](https://www.scrapingbee.com/blog/selenium-python/)) – Shreyesh Desai Apr 12 '21 at 11:26

Python Beautiful Soup not pulling all the data

1 Answers1