1

I understand that to get the contents of an iframe with beautifulsoup, you have to make a request for the src of the iframe.

However, when I do this, there is a div inside the iframe which I cannot seem to access.

res = requests.get('[iframe src]')
soup = bs4.BeautifulSoup(res.text, "html.parser")
print(soup)

This gives:

<!DOCTYPE html>
<html><head>...</head>
<body>
<div id="widgetApp"></div>
<script type="text/javascript"><script>
<script type="text/javascript"><script>
<script type="text/javascript"><script>
<script type="text/javascript"><script>
</body>
</html>

Using the developer tools/inspect element in browser, I can see that the #widgetApp div has plenty of other divs etc inside it. How do I get access to these?

Edit: To clarify, I'm trying to get access to the div #foo which is contained inside #widgetApp.

When I do:

elems = soup.select('#foo')
print(len(elems))

I get 0, ie it's not picking up the #foo div inside #widgetApp.

Hope that makes sense.

Any help very much appreciated.

SeanW
  • 11
  • 4
  • the `$0` is because you used the inspect element, it won't help with the parsing. also, is the contents of the iframe rendered in javascript? if so, beautifulsoup won't be sufficient – c2huc2hu Jul 31 '17 at 20:16
  • Thanks for the response. It does look like the contents of the iframe is rendered in javascript. I've added to the OP to show more accurately what the iframe is comprised of. So I can't access the contents of the div, even though it's separate from the scripts? – SeanW Jul 31 '17 at 20:25
  • the problem is that beautiful soup only sees what's on the page before executing js, and it doesn't look like there's anything on the page. I've never actually used beautifulsoup, but maybe this answer would help you: https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – c2huc2hu Jul 31 '17 at 20:40
  • I'm a bit confused about what the issue is that you're having accessing the div. The code you show doesn't make any attempt at it. If you have other code that does, please show that (and show what incorrect output you get, or the traceback of the exception that gets raised. As it stands, I don't understand what you're asking. – Blckknght Jul 31 '17 at 20:41
  • Thanks for the response @Blckknght. Sorry if this wasn't clear, but I'm trying to access a div (let's call it #foo) contained within the #widgetApps div. I've made an edit to the OP, hopefully this is more clear. – SeanW Jul 31 '17 at 20:43
  • @user3080953 - cheers. Do you know what I could use to access the data in the iframe apart from beautifulsoup? – SeanW Jul 31 '17 at 20:58
  • See the other question I linked. Haven't used any of those either sorry – c2huc2hu Aug 01 '17 at 02:07

0 Answers0