0

I want to be able to extract the original photos that google has on a search page like so:

https://www.google.com/search?biw=1046&bih=720&tbm=shop&ei=sznVWvq5OcbrzgKPgKLoDA&q=red+dress&oq=red+dress&gs_l=psy-ab.3..0l10.1256.2298.0.2485.9.7.0.0.0.0.238.408.0j1j1.2.0....0...1c.1.64.psy-ab..7.2.407....0.WHO8-4Nhfj0

After doing view inspect I saw that the original photos are connected to the word _image_src but I'm not quite sure how to grab these with beautifulsoup.

For example one of the images is:

_image_src='.......

I tried:

from bs4 import BeautifulSoup
import requests
import time
from random import randint
from urllib.parse import urljoin
import urllib.request

#reference for scraping google search https://stackoverflow.com/questions/39354587/scraping-google-news-with-beautifulsoup-returns-empty-results
s="red dress"
time.sleep(randint(0, 2))  # relax and don't let google be angry
r = requests.get("https://www.google.com/search?q="+s+"&tbm=shop")
#print(r)
html=r.content
#print(r.content)
#finding image tags reference here https://www.youtube.com/watch?v=tmgfCJv7dW0
html_text=r.text
soup=BeautifulSoup(html,"html.parser")
print(soup.prettify())

print(soup.find_all('_image_src'))

I noticed though that if print the soup it's not showing me everything on the view page, i.e. not printing the _image_src. Why is this not giving me everything on that page?

colidyre
  • 4,170
  • 12
  • 37
  • 53
Bob
  • 279
  • 6
  • 13
  • It says on the tin: `data:image/jpeg;base64`, so you take the value, base64-decode it, and store as a jpeg file. But this is likely a small thumbnail. – 9000 Apr 17 '18 at 19:14
  • on the tin? what does that mean? – Bob Apr 17 '18 at 19:18
  • It's [a figurative expression](https://en.wikipedia.org/wiki/Does_exactly_what_it_says_on_the_tin). It means that the description of the contents is present right before our eyes on the packaging. – 9000 Apr 17 '18 at 19:48
  • oh ok, well it didn't work – Bob Apr 17 '18 at 19:49
  • Please take another look at [a very similar question / answer](https://stackoverflow.com/a/19395899/223424). – 9000 Apr 17 '18 at 19:51

0 Answers0