0

This question is different from other grep pattern matching questions because we're looking for a large number of file extensions, and thus the following from this question will be too long and tedious to type: grep -r -i --include '*.ade' --include '*.adp' ... CP_Image ~/path[12345]

I was trying to email the backup of a static site when Google blocked my attachment upload for security reasons. Their support page says:

You can't send or receive the following file types:

.ade, .adp, .bat, .chm, .cmd, .com, .cpl, .exe, .hta, .ins, .isp, .jar, .jse, .lib, .lnk, .mde, .msc, .msp, .mst, .pif, .scr, .sct, .shb, .sys, .vb, .vbe, .vbs, .vxd, .wsc, .wsf, .wsh

I converted and tested the following Regular Expression here:

/.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)/gi

And tried running it with:

ls -lahR | grep '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'

It doesn't work. I don't think grep interprets the and (|) symbol properly because ls -lahR | grep '.*\.html' works

Community
  • 1
  • 1
Amin Shah Gilani
  • 8,675
  • 5
  • 37
  • 79
  • What version of `grep` are you using, and how are you validating whether it works or not? Do you have a specific file that you're trying and failing to find? – merlin2011 May 25 '15 at 09:18
  • I'm recursively trying to find files with the specified extensions. And it's `grep (GNU grep) 2.16`. – Amin Shah Gilani May 25 '15 at 09:19
  • Are you sure that the thing you're looking for exists? I've tested your expression on a few different cases and it was able to find everything I created. – merlin2011 May 25 '15 at 09:31

4 Answers4

2

Normal grep uses Basic Regular Expressions (BRE). In BRE, capturing groups are represented by \(...\) and the alternation op is referred by \|

grep '.*\.\(ade\|adp\|bat\|chm\|cmd\|com\|cpl\|exe\|hta\|ins\|isp\|jar\|jse\|lib\|lnk\|mde\|msc\|msp\|mst\|pif\|scr\|sct\|shb\|sys\|vb\|vbe\|vbs\|vxd\|wsc\|wsf\|wsh\)'

OR

grep -E '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|ms‌​t|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'

Use --extended-regex by enabling the -E parameter.

Reference

Amin Shah Gilani
  • 8,675
  • 5
  • 37
  • 79
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 1
    or `grep -E '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'` – Avinash Raj May 25 '15 at 09:21
  • I'll mark your answer as correct if you could move your comment in the answer text and expand on what BRE stands for. – Amin Shah Gilani May 25 '15 at 09:37
  • Also, be aware that it's **GNU** grep you're talking about; *POSIX* BRE doesn't support alternation at all. It treats `|` as a literal pipe character and `\|` as a syntax error. But switching to egrep is the solution in any case; POSIX ERE and GNU ERE both treat `|` as an alternation operator. – Alan Moore May 25 '15 at 09:55
  • @AlanMoore is there any other BRE other than POSIX? `\| as a syntax error` really? but it works for me. – Avinash Raj May 25 '15 at 09:58
  • @AvinashRaj: How are are you testing it? When I type the OP's regex into RegexBuddy and tell it the flavor is POSIX BRE I get an error message, but when I say it's GNU BRE it works. But I misspoke earlier; it's not treating `\|` as an error, but as an attempt to use alternation. Which, of course, is not supported in POSIX BRE.. – Alan Moore May 25 '15 at 10:53
  • `\|` is undefined by POSIX BRE, which means implementations can use it to mean anything they want. GNU grep happens to treat it as alternation. Other implementations may treat it as something else, though most likely it will just match a `|` character if it doesn't treat it as alternation. – geirha May 25 '15 at 12:59
1

Add the flag -E to indicate it's an extended regular expression. From GNU Grep 2.1: The default is "basic regular expression", and

[i]n basic regular expressions the meta-characters ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’ lose their special meaning.

Jongware
  • 22,200
  • 8
  • 54
  • 100
0

I'm recursively trying to find files with the specified extensions.

Better to use find with -iregex option:

find . -regextype posix-egrep -iregex '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'

On OSX use:

find -E . posix-egrep -iregex '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

A bash method to exclude the given extensions: use extended globbing

shopt -s extglob nullglob
ls *.!(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)
glenn jackman
  • 238,783
  • 38
  • 220
  • 352