3

When I inspect the elements on my browser, I can obviously see the exact web content. But when I try to run the below script, I cannot see the some of the web page details. In the web page I see there are "#document" elements and that is missing while I run the script. How can I see the details of #document elements or extract with the script.?

from bs4 import BeautifulSoup
import requests

response = requests.get('http://123.123.123.123/')
soup = BeautifulSoup(response.content, 'html.parser')
print soup.prettify()

enter image description here

Tanay Suthar
  • 453
  • 3
  • 8
  • 19
  • Post a snippet of what you did and a snippet of the HTML code so that we can help you further. – Zroq Mar 22 '17 at 13:06
  • http://stackoverflow.com/questions/21474605/what-does-document-mean. Actually I cannot see the content under the #document through my script. – Tanay Suthar Mar 22 '17 at 13:08

1 Answers1

3

You need to make additional requests to get the frame page contents as well:

from urlparse import urljoin

from bs4 import BeautifulSoup
import requests

BASE_URL = 'http://123.123.123.123/'

with requests.Session() as session:
    response = session.get(BASE_URL)
    soup = BeautifulSoup(response.content, 'html.parser')

    for frame in soup.select("frameset frame"):
        frame_url = urljoin(BASE_URL, frame["src"])

        response = session.get(frame_url)
        frame_soup = BeautifulSoup(response.content, 'html.parser') 
        print(frame_soup.prettify())
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195