4

I want to be able to make a list of users who have commented but am unable to find the iframe on the page when pulling it with BeautifulSoup. The comments are within the iframe and for some reason when I pull the html with BeautifulSoup, there does not seem to be an iframe in it. I know there is an iframe that holds the comments because I looked at the html on the webpage in order to try and drill down and pull what I needed with BeautifulSoup.

from bs4 import BeautifulSoup
from urllib import urlopen

url = urlopen("http://www.datpiff.com/Curreny-Alchemist-Carrollton-Heist-mixtape.766213.html")
bsObj = BeautifulSoup(url,"html.parser")

frame_list = bsObj.findAll("iframe")

for frame in frame_list:
    print(frame)

However, I do find this javascript that may be the answer to what I need but I want to ask, am I suppose to run this javascript somehow in order for the server holding this page to believe I am a user and then the iframe shows up?

<script language="javascript">
    var disqus_shortname = 'datpiff4';
    /* * * DON'T EDIT BELOW THIS LINE * * */
    (function () {
        var s = document.createElement('script'); s.async = true;
        s.type = 'text/javascript';
        s.src = '//' + disqus_shortname + '.disqus.com/count.js';
        (document.getElementsByTagName('HEAD')[0] || document.getElementsByTagName('BODY')[0]).appendChild(s);
    }());
</script>

I want to be able to get this iframe without having to open up a browser like when using selenium. Is this possible? If not, what can I use to do this other than BeautifulSoup?

ImNotBot
  • 51
  • 5

1 Answers1

2

iframe is appended by javascript which is executed after the page load in a suitable environment – browser. BeautifulSoup doesn't execute JS in any way – it just takes the string fetched from the definite URL and parses it as HTML.

La Faulx
  • 472
  • 3
  • 10
  • Thanks for the reply! So how can I get to the iframe? Should I not use BeautifulSoup? If so, what should I be using? – ImNotBot Mar 03 '16 at 23:31
  • 1
    To get the iframe you should get the rendered page after JS is executed – here are the same questions with answers: http://stackoverflow.com/questions/7064109/how-to-parse-html-that-includes-javascript-code http://stackoverflow.com/questions/11047348/is-this-possible-to-load-the-page-after-the-javascript-execute-using-python – La Faulx Mar 03 '16 at 23:36
  • Thank you, but I have seen those pages. I might have to rephrase my question. Can I get to an iframe without having to use selenium to open up a web browser? I really want to move away from selenium if possible. – ImNotBot Mar 03 '16 at 23:52
  • 1
    BeautifulSoup won't do that – La Faulx Mar 04 '16 at 08:21
  • Do you know of anything I can use then? Other than selenium? Thanks for all your replies! – ImNotBot Mar 04 '16 at 09:25
  • 1
    PhantomJS is the other and the only thing I know then. – La Faulx Mar 04 '16 at 09:27
  • Thanks! this is something I will definitely look into. – ImNotBot Mar 05 '16 at 07:38