1

I would like to get source code from a google image search, I saw in Google Search by Image Script for Local Images that https://www.google.com/searchbyimage?&image_url= + image link works. I am using Python and this is what I tried:

from bs4 import BeautifulSoup
import requests
browser = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':browser,}
url = ("https://www.google.com/searchbyimage?&image_url="+"http://mlm-s1-p.mlstatic.com/635657-MLM25528207389_042017-O.jpg")
page = requests.get(url, headers= headers)
soup = BeautifulSoup(page.text, "html.parser")

pretty = soup.prettify()
print(pretty)

Which is pretty different from Chrome's source code if we add view-source: to this.

If you want to know, the purpose of the script is to find Google's best guess of the image as a string, in my example it would be: lemmy kilmister funko pop, but I can't even find any of this words in Beautiful Soup's html.

EDIT: Forgot to include libraries and beautifulsoup/ requests

Rafael Martínez
  • 335
  • 1
  • 2
  • 17

1 Answers1

0

Is this all your code? Just clarifying because you don't create the soup object or request the page.

Assuming you've done that, from the link you added here the answerer says that this only works with a browser header, which is a weird condition. I ran your code in curl using your header and it redirected me to the front page which is why you couldn't find your string. However running it with my current Firefox Header

 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0

I was able to get the right page. Be careful that your code doesn't break later on due to this condition. I tested by editing the header slightly but some changes (made the version 53.0 was fine, making it 5.0 was not) are tight.

curl command btw

  curl "https://www.google.com/searchbyimage?&image_url=mlm-s1-p.mlstatic.com/635657-MLM25528207389_042017-O.jpg" -L -v -o file.html -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0"