0

Certain CDNs like googleusercontent don't (obviously) encode the filenames of images in their urls, so you can't get the filetype from simply using string manipulation like other answers here have suggested. knowing this, how can tell that

https://lh3.googleusercontent.com/pw/AM-JKLURvu-Ro2N3c1vm1PTM3a7Ae5nG3LNWynuKNEeFNBMwH_uWLQJe0q0HmaOzKC0k0gRba10SbonLaheGcNpxROnCenf1YJnzDC3jL-N9fTtZ7u0q5Z-3iURXtrt4GlyeEI3t4KWxprFDqFWRO29sJc8=w440-h248-no

is a gif whilst

https://lh3.googleusercontent.com/pw/AM-JKLXk2WxafqHOi0ZrETUh2vUNkiLyYW1jRmAQsHBmYyVP7Le-KBCSVASCgO2C6_3QbW3LcLYOV_8OefPafyz2i4g8nqpw8xZnIhzDdemd5dFPS5A7dVAGQWx9DIy5aYOGuh06hTrmfhF9mZmITjjTwuc=w1200-h600-no

is a jpg

gfaster
  • 157
  • 2
  • 12
  • 1
    Download at least the first few bytes of the file. Most formats begin with a sequence of "magic bytes" to identify format like "GIF" or "JFIF". – Michael Butscher Jun 29 '21 at 01:19

1 Answers1

1

Building on the responses to this question, you could try:

import requests
from PIL import Image       # pillow package
from io import BytesIO

url = "your link"

image = Image.open( BytesIO( requests.get( url ).content))
file_type = image.format

This calls for downloading the entire file, though. If you're looking to do this in bulk, you might want to explore the option in the comment above that mentions "magic bytes"...

Edit: You can also try to get the image type from the headers of the response to your url:

headers = requests.get(url).headers
file_type =headers.get('Content-Type', "nope/nope").split("/")[1]
# Will print 'nope' if 'Content-Type' header isn't found
print(file_type)
# Will print 'gif' or 'jpeg' for your listed urls

Edit 2: If you're really only concerned with the file type of the link and not the file itself, you could use the head method instead of the get method of the requests module. It's faster:

headers = requests.head(url).headers
file_type =headers.get('Content-Type', "nope/nope").split("/")[1]
mark_s
  • 466
  • 3
  • 6