1

I am working on a web scraping task. I am using beautiful soup and urllib. When I going to run the code I am getting only part of the first part of the website. Non-buffered part is missing in code. Anyone have an idea about how to get a fully buffered website source code. I am trying the code given below.

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://play.google.com/store/apps?hl=en').read()
soup = bs.BeautifulSoup(source,'lxml')

Please help if anyone has an idea about it.

  • When you mention the buffered website code, do you mean the content that only loads once you scroll down the page? – Joseph Rajchwald Feb 03 '20 at 23:28
  • I think you understand right. I want the code which is available when you click inspect after right-clicking on the website. but I am getting the code of view page source show on right-click. when you scroll down new data will come after buffers like Facebook and other social sites. that is updating in "inspect" but not in "view page source". So, my question is basically how to get "inspect" code not "view page source" using python. – chaitanya sonagara Feb 04 '20 at 11:07
  • you can use selenium to scroll down the web page a number of times. Once you've scrolled down how much you want you can then use it to get the page source. Not exact the most ideal method but it should work to a certain extent. You can start off with [this](https://stackoverflow.com/questions/12293158/page-scroll-up-or-down-in-selenium-webdriver-selenium-2-using-java) page. – Joseph Rajchwald Feb 07 '20 at 00:59
  • I will check it out thanks. – chaitanya sonagara Feb 22 '20 at 09:39

0 Answers0