Check if an image file is a valid SVG file in python

Question

I am looking for a way to determine if an image_file is a valid SVG file.

This answer using imghdr works well for other file types, PNG, etc, but for SVG it just returns None.

Is there a different package I can use, or some other way than just checking the file extension, to ensure an image_file is SVG?

load it in an xml parser, use the parser to check the root element is svg and is in the SVG namespace — Robert Longson, Aug 14 '20 at 22:21
in case my answer helped you out, would you mind "accepting" it? I could use exactly those points for a personal goal. If you need some further adjustment in order for it to be useful to you, I can still work on that, of course. — Walter Tross, Dec 05 '20 at 10:21

Walter Tross · Answer 1 · 2020-08-15T23:46:02.750

7

import re
from urllib.request import urlopen

SVG_R = r'(?:<\?xml\b[^>]*>[^<]*)?(?:<!--.*?-->[^<]*)*(?:<svg|<!DOCTYPE svg)\b'
SVG_RE = re.compile(SVG_R, re.DOTALL)

# an example SVG file:
f = urlopen("https://upload.wikimedia.org/wikipedia/commons/1/17/Yin_yang.svg")

file_contents = f.read().decode('latin_1')  # avoid any conversion exception

is_svg = SVG_RE.match(file_contents) is not None

print(['NOT SVG', 'SVG'][is_svg])  # prints SVG

A possible optimization is to read and/or decode only the first N bytes. The problem with determining N is that before <svg or <!DOCTYPE svg there can be very long comments.

The regex has been validated with 32120 SVG files on my Mac.

edited Aug 15 '20 at 23:46

answered Aug 14 '20 at 20:54

Walter Tross

12,237
2
40
64

Interesting, so that regex just checks and sees if the contents of the file is in a valid svg format? – DjangoBlockchain Aug 14 '20 at 20:56
1

well, it checks whether the _start_ of the file is a valid SVG format. But that's what `imghdr` does as well... – Walter Tross Aug 14 '20 at 20:59
In a comment to an [answer](https://stackoverflow.com/questions/62928249/how-can-we-validate-svg-image-without-using-magic-number-in-angularjs/62959770#62959770) that tackled the same problem in a browser environment, it was pointed out to me that testing for the XML prolog might be a too restrictive requirement. – ccprog Aug 14 '20 at 21:32
@ccprog I don't quite understand your remark, but thanks for the link to [the specs](https://www.w3.org/TR/SVG2/conform.html#ConformingSVGStandAloneFiles), which tells me that I can remove case-insensitivity. – Walter Tross Aug 14 '20 at 21:54
1

What I meant is that especially browsers will render standalone SVG files even if the XML prolog is missing and the file starts with the root ` – ccprog Aug 14 '20 at 22:02
I see. It's a matter of choosing whether to support crazy SVG files without XML prolog or to reject crazy files that start with ` – Walter Tross Aug 14 '20 at 22:18
OTOH, checking SVG files on my Mac, I found not only examples of missing XML prolog, but also examples of ` – Walter Tross Aug 14 '20 at 22:57

Check if an image file is a valid SVG file in python

1 Answers1