2

I made a crawler in python, and I am trying to download the image from this article-http://www.bbc.com/news/business-34958154. The problem in this website, is that it auto rezise it, and when I am tring to download the article's image, it give me in 320 pixels(too small). The reason it happen is that the crawler is enter to the url's source file(view-source:http://www.bbc.com/news/business-34958154), which there is 320 pixels. There is a way to make the image max size, or how I see it in the browser? This is the code that take the images:

r = requests.get("http://www.bbc.com/news/business-34958154")
soup = BeautifulSoup(r.content)
soupAllImgs=soup.findAll('img',src=True)
jaldk
  • 123
  • 1
  • 7

1 Answers1

1

That image tag has src="http://ichef.bbci.co.uk/news/320/media/images/78532000/jpg/_78532434_hs2ii.jpg". You can get the image almost any size you want by changing the 320 in the URL — *** here:

http://ichef.bbci.co.uk/news/***/media/images/78532000/jpg/_78532434_hs2ii.jpg

Looks like they use JavaScript to replace it dynamically, probably depending on the bandwidth and device. I found I could get anything from 10 to 999 px, then 1024 and 2048... I didn't test many more.

Updates after clarification

If you want to get what's on the screen — that is, after any JavaScript has finished executing — then you need something that can execute JS like Selenium, see this question for example.

There are even ways to do this without opening a browser — be sure to read all the answers and comments for the full range of what's possible.

Community
  • 1
  • 1
Matt Hall
  • 7,614
  • 1
  • 23
  • 36
  • 1
    I aware that I can change the url of the image, but this is just example. I want something more general and not only for bbc. – jaldk Nov 30 '15 at 13:15
  • 1
    Update response: I tried to use selenium, and it worked. but in order to work it open firefox load the whole page and then retrieve the source code. It took a few seconds to load (way too much), do the program has to open the browser in order to load JS? and maybe there is whole diffrent method to do it? this is my code: browser = webdriver.Firefox(); browser.get("http://www.bbc.com/news/business-34958154"); soup = BeautifulSoup(browser.page_source); print(soup); – jaldk Nov 30 '15 at 13:55