import re
from urllib.request import urlopen
SVG_R = r'(?:<\?xml\b[^>]*>[^<]*)?(?:<!--.*?-->[^<]*)*(?:<svg|<!DOCTYPE svg)\b'
SVG_RE = re.compile(SVG_R, re.DOTALL)
# an example SVG file:
f = urlopen("https://upload.wikimedia.org/wikipedia/commons/1/17/Yin_yang.svg")
file_contents = f.read().decode('latin_1') # avoid any conversion exception
is_svg = SVG_RE.match(file_contents) is not None
print(['NOT SVG', 'SVG'][is_svg]) # prints SVG
A possible optimization is to read and/or decode only the first N bytes. The problem with determining N is that before <svg
or <!DOCTYPE svg
there can be very long comments.
The regex has been validated with 32120 SVG files on my Mac.