102

I have a .NET webform that has a file upload control that is tied to a regular expression validator. This validator needs to validate that only certain filetypes should be allowed for upload (jpg,gif,doc,pdf)

The current regular expression that does this is:


^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF|.doc|.DOC|.pdf|.PDF)$

However this does not seem to be working... can anyone give me a little reg ex help?

TylerH
  • 20,799
  • 66
  • 75
  • 101
mmattax
  • 27,172
  • 41
  • 116
  • 149
  • 19
    I'm sure you know this, but, in case someone later finds this question who doesn't: This method will only verify the file's extension, not its actual type. Once you receive the file, you *must* examine its contents to determine what it really is. If you rely on the name, it's a huge security flaw. – Dave Sherohman Dec 20 '08 at 07:52
  • The `$` is important, as otherwise you easily fall for filenames that just continue (i.e. `AnnaKournikova.jpg.vbs`, see [Anna Kournikova (computer virus)](https://en.wikipedia.org/wiki/Anna_Kournikova_(computer_virus)). – AmigoJack Jun 28 '21 at 13:04

6 Answers6

216

Your regex seems a bit too complex in my opinion. Also, remember that the dot is a special character meaning "any character". The following regex should work (note the escaped dots):

^.*\.(jpg|JPG|gif|GIF|doc|DOC|pdf|PDF)$

You can use a tool like Expresso to test your regular expressions.

Dario Solera
  • 5,694
  • 3
  • 29
  • 34
  • When doing regular expressions in .NET, enumerating casing differences is not required. It not only can decrease readability, but also can degrade performance if it is called in a loop, for example. – Joseph Ferris Dec 17 '08 at 15:56
  • 2
    The problem is that the regex is used in a RegularExpressionValidator ASP.NET control, which AFAIK does not accept options such as IgnoreCase. – Dario Solera Dec 17 '08 at 16:24
  • I missed that in the original post. Yes, RegularExpressionValidator case-sensitivity options are something that Microsoft has been ignoring the pleas from the community at large for a few years now. – Joseph Ferris Dec 17 '08 at 16:37
  • 1
    You can leave out the ^.* as "match anything from the beginning up until this expression at the end" is the same as "match this expression at the end". You can also embed regular expression options http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx – ICR Dec 18 '08 at 00:11
  • In order to embed the regular expression option to ignore case you need to disable ClientSide script (I don't think JavaScript support it). You can then use use "(?i:.(jpg|gif|doc|pdf))$" for a case insensitive match. – Martin Brown Dec 19 '08 at 14:38
  • What happens if you have an extension that is mixed case? Like .Doc ? – Dan Diplo Nov 23 '09 at 18:22
  • @DanDiplo Then it won't accept that file extension, unless you add it in. – TylerH Apr 19 '19 at 20:58
25
^.+\.(?:(?:[dD][oO][cC][xX]?)|(?:[pP][dD][fF]))$

Will accept .doc, .docx, .pdf files having a filename of at least one character:

^           = beginning of string
.+          = at least one character (any character)
\.          = dot ('.')
(?:pattern) = match the pattern without storing the match)
[dD]        = any character in the set ('d' or 'D')
[xX]?       = any character in the set or none 
              ('x' may be missing so 'doc' or 'docx' are both accepted)
|           = either the previous or the next pattern
$           = end of matched string

Warning! Without enclosing the whole chain of extensions in (?:), an extension like .docpdf would pass.

You can test regular expressions at http://www.regextester.com/

mdunka
  • 261
  • 3
  • 2
20

Are you just looking to verify that the file is of a given extension? You can simplify what you are trying to do with something like this:

(.*?)\.(jpg|gif|doc|pdf)$

Then, when you call IsMatch() make sure to pass RegexOptions.IgnoreCase as your second parameter. There is no reason to have to list out the variations for casing.

Edit: As Dario mentions, this is not going to work for the RegularExpressionValidator, as it does not support casing options.

Joseph Ferris
  • 12,576
  • 3
  • 46
  • 72
  • 1
    This one allows dots to be included in file name which is fine for me – Bronek Jul 02 '18 at 11:26
  • Using the case insensitive option `(?i)` (without colon) worked for me in asp.net 6 also in the RegularExpressionValidator. – Qrt Jul 15 '22 at 11:30
13

You can embed case insensitity into the regular expression like so:

\.(?i:)(?:jpg|gif|doc|pdf)$
ICR
  • 13,896
  • 4
  • 50
  • 78
  • 1
    Except that this fails if you leave the client script option enabled. – Martin Brown Dec 19 '08 at 14:39
  • Afaik javascript does allow inline options, but it applies to the whole regex and not just everything after it, which doesn't matter in this case. Unless there is another reason it won't work (I can't test atm). – ICR Dec 20 '08 at 07:27
  • 2
    No, JS doesn't support inline modifiers at all. Also, your regex won't work even in .NET; you want either "\.(?i)(?:jpg|gif|doc|pdf)$" or "\.(?i:jpg|gif|doc|pdf)$". This: "(?i:)" just matches nothing, case-insensitively. – Alan Moore Apr 10 '09 at 19:19
8

Your regexp seems to validate both the file name and the extension. Is that what you need? I'll assume it's just the extension and would use a regexp like this:

\.(jpg|gif|doc|pdf)$

And set the matching to be case insensitive.

PEZ
  • 16,821
  • 7
  • 45
  • 66
6

You can use this template for every file type:

ValidationExpression="^.+\.(([pP][dD][fF])|([jJ][pP][gG])|([pP][nN][gG])))$"

for ex: you can add ([rR][aA][rR]) for Rar file type and etc ...

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
Sajjad mc
  • 75
  • 1
  • 1
  • Your example is invalid, it ends with 3 parentheses instead of 2. Anyway it's more readable to add a flag to ignore case. `(?i)^.+\.(pdf|jpg|png)$` – Marcus Voltolim Mar 18 '22 at 18:52