I recently had a need to validate media types a bit more strictly than the existing answers. Here's what I came up with, based on the intersection of the grammar from RFC 2045 Section 5.1 and RFC 7231 Section 3.1.1.1 (which disallows {}
in tokens and whitespace except between parameters). For a C-like language with (?:)
non-capturing groups:
ows = "[ \t]*";
token = "[0-9A-Za-z!#$%&'*+.^_`|~-]+";
quotedString = "\"(?:[^\"\\\\]|\\.)*\"";
type = "(application|audio|font|example|image|message|model|multipart|text|video|x-(?:" + token + "))";
parameter = ";" + ows + token + "=" + "(?:" + token + "|" + quotedString + ")";
mediaType = type + "/" + "(" + token + ")((?:" + ows + parameter + ")*)";
This ends up with a rather monstrous
"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_`|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+|\"(?:[^\"\\\\]|\\.)*\"))*)"
which captures type, subtype, and parameters, or just
"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)"
omitting parameters. Note that these could be made more forward-compatible (and less strict) by allowing any token
for type
(as RFC 7231 does) rather than limiting to "application", "audio", etc.
In practice you may want to additionally limit inputs to IANA Registered Media Types or mailcap or specific types appropriate for your application based on intended use.