-1

I would like to scrape a webpage using BeautifulSoup and requests. The below code works, but I do not get the full div back.

import requests
cert = (certs['cert'], certs['password'])
r = requests.get(url,
                 cert=(certs['cert'], certs['password']),
                 verify=certs['CA_file'])

For r.text I get

...... <div id=\'App\'></div>\r\n  <script type="text/javascript" src="/bundle.js"></script></body>\r\n</html>\r\n'

but I would like to have the html code in <div id=\'App\'></div> but it does not show. I tried some different headers but they also did not work. Can this be done with BeautifulSoup (i would prefer to not use selenium as it gets way too complicated with the credentials). I need to use microsoft Edge.

Is there anything I can do to get the full html code in the div?

baduker
  • 19,152
  • 9
  • 33
  • 56
corianne1234
  • 634
  • 9
  • 23

1 Answers1

0

You might need to load your url using Chrome Developer Tools and watch what it does. It is common for some sites to define an empty div and then use JS to populate it. BeautifulSoup isn't a full browser and isn't executing the Javascript.

If that is the case, you'll need to use a browser itself (you can do it in headless mode often) to fully render the page and then try to pull the data you want.

Cargo23
  • 3,064
  • 16
  • 25
  • so that means I would have to use Selenium for example? Could I still use requests.get() along with it? – corianne1234 Apr 21 '23 at 06:31
  • Selenium is a good choice, I don't think you'll need requests.get(). Basically Selenium will load the page and then you can use its various "find_element_xxx" functions to find the parts you want. – Cargo23 Apr 21 '23 at 16:12