5

I am looking for a way to determine if an image_file is a valid SVG file.

This answer using imghdr works well for other file types, PNG, etc, but for SVG it just returns None.

Is there a different package I can use, or some other way than just checking the file extension, to ensure an image_file is SVG?

DjangoBlockchain
  • 534
  • 2
  • 17
  • 1
    load it in an xml parser, use the parser to check the root element is svg and is in the SVG namespace – Robert Longson Aug 14 '20 at 22:21
  • in case my answer helped you out, would you mind "accepting" it? I could use exactly those points for a personal goal. If you need some further adjustment in order for it to be useful to you, I can still work on that, of course. – Walter Tross Dec 05 '20 at 10:21

1 Answers1

7
import re
from urllib.request import urlopen

SVG_R = r'(?:<\?xml\b[^>]*>[^<]*)?(?:<!--.*?-->[^<]*)*(?:<svg|<!DOCTYPE svg)\b'
SVG_RE = re.compile(SVG_R, re.DOTALL)

# an example SVG file:
f = urlopen("https://upload.wikimedia.org/wikipedia/commons/1/17/Yin_yang.svg")

file_contents = f.read().decode('latin_1')  # avoid any conversion exception

is_svg = SVG_RE.match(file_contents) is not None

print(['NOT SVG', 'SVG'][is_svg])  # prints SVG

A possible optimization is to read and/or decode only the first N bytes. The problem with determining N is that before <svg or <!DOCTYPE svg there can be very long comments.

The regex has been validated with 32120 SVG files on my Mac.

Walter Tross
  • 12,237
  • 2
  • 40
  • 64
  • Interesting, so that regex just checks and sees if the contents of the file is in a valid svg format? – DjangoBlockchain Aug 14 '20 at 20:56
  • 1
    well, it checks whether the _start_ of the file is a valid SVG format. But that's what `imghdr` does as well... – Walter Tross Aug 14 '20 at 20:59
  • In a comment to an [answer](https://stackoverflow.com/questions/62928249/how-can-we-validate-svg-image-without-using-magic-number-in-angularjs/62959770#62959770) that tackled the same problem in a browser environment, it was pointed out to me that testing for the XML prolog might be a too restrictive requirement. – ccprog Aug 14 '20 at 21:32
  • @ccprog I don't quite understand your remark, but thanks for the link to [the specs](https://www.w3.org/TR/SVG2/conform.html#ConformingSVGStandAloneFiles), which tells me that I can remove case-insensitivity. – Walter Tross Aug 14 '20 at 21:54
  • 1
    What I meant is that especially browsers will render standalone SVG files even if the XML prolog is missing and the file starts with the root `` element. Other renderers might complain or not. Validity might be a concept that depends on your use cases. – ccprog Aug 14 '20 at 22:02
  • I see. It's a matter of choosing whether to support crazy SVG files without XML prolog or to reject crazy files that start with ` – Walter Tross Aug 14 '20 at 22:18
  • OTOH, checking SVG files on my Mac, I found not only examples of missing XML prolog, but also examples of ` – Walter Tross Aug 14 '20 at 22:57