I want to be able to extract the original photos that google has on a search page like so:
After doing view inspect I saw that the original photos are connected to the word _image_src
but I'm not quite sure how to grab these with beautifulsoup
.
For example one of the images is:
_image_src='data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBwgHBgkIBwgKCgkLDRYPDQwMDRsUFRAWIB0iIiAdHx8kKDQsJCYxJx8fLT0tMTU3Ojo6Iys/RD84QzQ5OjcBCgoKDQwNGg8PGjclHyU3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3N//AABEIAYYBZgMBIgACEQEDEQH/xAAcAAEBAAIDAQEAAAAAAAAAAAAAAQIFAwQGBwj/xABFEAACAQMCAgcFAwoEBQUBAAAAAQIDBBEFEiExBhNBUWFxgSIykaGxFEJSBxUjJHKSwdHh8DNigqJDc7LC8SU0NURjFv/EABoBAQADAQEBAAAAAAAAAAAAAAABAwQCBQb/xAAvEQEAAgIBAwMDAwMEAwAAAAAAAQIDESEEEjEFIjITQVEUYbFxkdEVM6HwI0KB/9oADAMBAAIRAxEAPwD6KigEoEigoAFwEBSIoAApcAYopkQBgFGAIkUuBgCAowBAUoEBcDAEGClAgwC4JAYBQIChgcNzXp21CVarLEYrifOtU6dandXNWOkUrX7HSeHOFxCrOWG/a4SW1eGM/RdT8oOvz1SpOytJv7FTkotx49bLux29vDuWX4d3ot0Bp1aFOvqqkpPElST4rzZRbJ+F9Mf5aqHTHX6VRRV7KcZNZp1YR9nualtT/ez2evNV6e6pa3FJXFSjUUlnq2trfe8x5c0uWOHI9rcdD9JhDb9neF2bmeV6RdFoUaUp2EJOlHLlRbzw8MpnEZdSs+lExw9x0e1q31zTqd3at8eE4Sxug+5m0PjfQPUVpWq1ouUqdOaxNdsfHHas88dj4ccY+xUKsK9GnVptOE4qUWnnKayX0ttnvXtlQZMh04YtFKAMWQyIBMEwZYIBMdwKQDiKUYAAuAALgJGQEwCjAEwUowBC4LgATAKAABcAQFAESKCgQFAE8y4BQMSlwCRCjBUgIkarpRdO20a4UJYqVIqEX3Z5+XDc/Rm2PJdPK8IWMadR8K7lSjxxxktsvXY5erObzqrqkbtEPN9ErKnXvqcpxyoQ+0PK45m3s+Sy13qJ9JtMpZWOR8v6E6pO8t9YvKUown18du+PCEdqaylzxx4ZMqnSmtK4jCnrd9NSTa20ae3anhyyo4xlNGON7btbh9Pqt1E1lepqbqdKNXqXVp9Y+VPetz9DW9Ialf8AMdKpUVd9avbVJPfw49nHs5I8ppO2yu+pWjOFNVlCoq1GM3Uy3mWXndHlx8RPJEah0emFstE163vYpqjXjmePhL4r547z6H0Qv41bKnSW3ZlxW3lGXP4SWWn3p955L8qtGVex077OlulUaipSwsvbzb5Jd79Tp/k2vbinc3Ok1pQdeNNdW4zUoScfbpyTXN.......
I tried:
from bs4 import BeautifulSoup
import requests
import time
from random import randint
from urllib.parse import urljoin
import urllib.request
#reference for scraping google search https://stackoverflow.com/questions/39354587/scraping-google-news-with-beautifulsoup-returns-empty-results
s="red dress"
time.sleep(randint(0, 2)) # relax and don't let google be angry
r = requests.get("https://www.google.com/search?q="+s+"&tbm=shop")
#print(r)
html=r.content
#print(r.content)
#finding image tags reference here https://www.youtube.com/watch?v=tmgfCJv7dW0
html_text=r.text
soup=BeautifulSoup(html,"html.parser")
print(soup.prettify())
print(soup.find_all('_image_src'))
I noticed though that if print the soup it's not showing me everything on the view page, i.e. not printing the _image_src
. Why is this not giving me everything on that page?