requests does not retrun the full div

Question

I would like to scrape a webpage using BeautifulSoup and requests. The below code works, but I do not get the full div back.

import requests
cert = (certs['cert'], certs['password'])
r = requests.get(url,
                 cert=(certs['cert'], certs['password']),
                 verify=certs['CA_file'])

For r.text I get

...... <div id=\'App\'></div>\r\n  <script type="text/javascript" src="/bundle.js"></script></body>\r\n</html>\r\n'

but I would like to have the html code in <div id=\'App\'></div> but it does not show. I tried some different headers but they also did not work. Can this be done with BeautifulSoup (i would prefer to not use selenium as it gets way too complicated with the credentials). I need to use microsoft Edge.

Is there anything I can do to get the full html code in the div?

Can you add the example document or the URL you are trying to scrape? This would help in recreating the problem. — bg2094, Apr 20 '23 at 11:48
Fair enough. Take a look at this and let me know if it helps. https://stackoverflow.com/a/2136323/10127761 — bg2094, Apr 20 '23 at 12:05
In your case, the line which finds the required data will look something like this `soup.find("div", {"id": "App"})`. — bg2094, Apr 20 '23 at 12:08
yes, the return for that is just this: `
` but there should be some stuff in this div — corianne1234, Apr 20 '23 at 12:58

score 0 · Accepted Answer · answered Apr 20 '23 at 15:43

0

You might need to load your url using Chrome Developer Tools and watch what it does. It is common for some sites to define an empty div and then use JS to populate it. BeautifulSoup isn't a full browser and isn't executing the Javascript.

If that is the case, you'll need to use a browser itself (you can do it in headless mode often) to fully render the page and then try to pull the data you want.

answered Apr 20 '23 at 15:43

Cargo23

3,064
16
25

so that means I would have to use Selenium for example? Could I still use requests.get() along with it? – corianne1234 Apr 21 '23 at 06:31
Selenium is a good choice, I don't think you'll need requests.get(). Basically Selenium will load the page and then you can use its various "find_element_xxx" functions to find the parts you want. – Cargo23 Apr 21 '23 at 16:12

requests does not retrun the full div

1 Answers1