You can achieve this using selenium
but the execution time will be slower than using bs4
.
To scrape the original image link using bs4
, you need to parse <script>
tags with regex
and then parse those links.
For example, part of the code (check out full example in the online IDE):
# find all script tags
all_script_tags = soup.select('script')
# find all full res images
matched_google_full_resolution_images = re.findall(r"(?:'|,),\[\"(https:|http.*?)\",\d+,\d+\]",
all_script_tags)
# iterate over found matches and decode them
for fixed_full_res_image in matched_google_full_resolution_images:
original_size_img_not_fixed = bytes(fixed_full_res_image, 'ascii').decode('unicode-escape')
original_size_img = bytes(original_size_img_not_fixed, 'ascii').decode('unicode-escape')
print(original_size_img)
-----
'''
https://external-preview.redd.it/mAQWN2kUYgFS3fgm6LfYo37AY7i2e_YY8d83_1jTeys.jpg?auto=webp&s=b2bad0e23cbd83426b06e6a547ef32ebbc08e2d2
https://i.ytimg.com/vi/_mR0JBLXRLY/maxresdefault.jpg
https://wallpaperaccess.com/full/37454.jpg
...
'''
Alternatively, you can achieve this easily by using Google Images API from SerpApi. It's a paid API with a free plan.
The difference is that you don't need to figure out how to scrape something or maintain the parser if something will change over time. All that needs to be done is just to iterate over structured JSON and extract needed data.
Code to integrate:
import os, json
from serpapi import GoogleSearch
params = {
"api_key": os.getenv("API_KEY"),
"engine": "google",
"q": "minecraft shaders 8k photo",
"tbm": "isch"
}
search = GoogleSearch(params)
results = search.get_dict()
print(json.dumps(results['suggested_searches'], indent=2, ensure_ascii=False))
print(json.dumps(results['images_results'], indent=2, ensure_ascii=False))
------
'''
[
...
{
"position": 30,
"thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQ_CjA8J1P5Y6bN2KCuY6XgS4mFvctuwhho6A&usqp=CAU",
"source": "wallpaperbetter.com",
"title": "minecraft shaders video games, HD wallpaper | Wallpaperbetter",
"link": "https://www.wallpaperbetter.com/en/hd-wallpaper-cusnk",
"original": "https://p4.wallpaperbetter.com/wallpaper/120/342/446/minecraft-shaders-video-games-wallpaper-preview.jpg",
"is_product": false
}
...
]
'''
I have already answered a similar question here and wrote a dedicated blog about how scrape and download Google Images with Python.
Disclaimer, I work for SerpApi.