5

I want to validate internet types input via my API.

Can you help writing a regex to match?

Example types below from http://en.wikipedia.org/wiki/Internet_media_type

application/atom+xml
application/EDI-X12
application/xml-dtd
application/zip
application/vnd.openxmlformats-officedocument.presentationml.presentation
video/quicktime

Must meet standard:

type / media type name [+suffix]
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Pete Thorne
  • 2,656
  • 4
  • 21
  • 29

3 Answers3

9

I recently had a need to validate media types a bit more strictly than the existing answers. Here's what I came up with, based on the intersection of the grammar from RFC 2045 Section 5.1 and RFC 7231 Section 3.1.1.1 (which disallows {} in tokens and whitespace except between parameters). For a C-like language with (?:) non-capturing groups:

ows = "[ \t]*";
token = "[0-9A-Za-z!#$%&'*+.^_`|~-]+";
quotedString = "\"(?:[^\"\\\\]|\\.)*\"";
type = "(application|audio|font|example|image|message|model|multipart|text|video|x-(?:" + token + "))";
parameter = ";" + ows + token + "=" + "(?:" + token + "|" + quotedString + ")";
mediaType = type + "/" + "(" + token + ")((?:" + ows + parameter + ")*)";

This ends up with a rather monstrous

"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_`|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+|\"(?:[^\"\\\\]|\\.)*\"))*)"

which captures type, subtype, and parameters, or just

"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)"

omitting parameters. Note that these could be made more forward-compatible (and less strict) by allowing any token for type (as RFC 7231 does) rather than limiting to "application", "audio", etc.

In practice you may want to additionally limit inputs to IANA Registered Media Types or mailcap or specific types appropriate for your application based on intended use.

Community
  • 1
  • 1
Kevinoid
  • 4,180
  • 40
  • 25
4

This is really straightforward:

\w+/[-+.\w]+

Demo: http://regex101.com/r/oH5bS7/1

And if you want to validate there's at most one +:

\w+/[-.\w]+(?:\+[-.\w]+)?

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
1

A more general regex with support of parameter is:

(?P<main>\w+|\*)/(?P<sub>\w+|\*)(\s*;\s*(?P<param>\w+)=\s*=\s*(?P<val>\S+))?

Demo: http://regex101.com/r/lQ3rX4/2

Wael Ben Zid El Guebsi
  • 2,670
  • 1
  • 17
  • 15