Could not able to extract #document from HTML file through python web scraping

Question

When I inspect the elements on my browser, I can obviously see the exact web content. But when I try to run the below script, I cannot see the some of the web page details. In the web page I see there are "#document" elements and that is missing while I run the script. How can I see the details of #document elements or extract with the script.?

from bs4 import BeautifulSoup
import requests

response = requests.get('http://123.123.123.123/')
soup = BeautifulSoup(response.content, 'html.parser')
print soup.prettify()

Post a snippet of what you did and a snippet of the HTML code so that we can help you further. — Zroq, Mar 22 '17 at 13:06
http://stackoverflow.com/questions/21474605/what-does-document-mean. Actually I cannot see the content under the #document through my script. — Tanay Suthar, Mar 22 '17 at 13:08

score 3 · Accepted Answer · answered Mar 22 '17 at 13:27

You need to make additional requests to get the frame page contents as well:

from urlparse import urljoin

from bs4 import BeautifulSoup
import requests

BASE_URL = 'http://123.123.123.123/'

with requests.Session() as session:
    response = session.get(BASE_URL)
    soup = BeautifulSoup(response.content, 'html.parser')

    for frame in soup.select("frameset frame"):
        frame_url = urljoin(BASE_URL, frame["src"])

        response = session.get(frame_url)
        frame_soup = BeautifulSoup(response.content, 'html.parser') 
        print(frame_soup.prettify())

Could not able to extract #document from HTML file through python web scraping

1 Answers1