9

I'm using the following regex find command in OS X terminal to find a whole load of files that have 8 digit file names followed by either a .jpg, .gif, .png or .eps extension. The following produces no results even though I've told OS X/BSD find to use modern regex

find -E ./ -iregex '\d{8}'

Using http://rubular.com/ (http://rubular.com/r/YMz3J8Qlgh) shows that the regex pattern produces the expected results and OS X produces the results when typing

find . -iname '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].*'

But this seems a little long winded.

Wiseguy
  • 20,522
  • 8
  • 65
  • 81
juliushibert
  • 404
  • 1
  • 5
  • 14
  • 1
    Manpages say it uses POSIX; perhaps you need `[:digit:]` instead of `\d`? – Wiseguy Mar 23 '12 at 17:39
  • 1
    @Wiseguy \d is not supported in BRE (POSIX basic re) nor ERE (POSIX extended re). The default regex for GNU find in emacs, which is similar to BRE. BRE does not support intervals ({8}). – jordanm Mar 23 '12 at 19:11
  • @jordanm Right, which is why I suggested using a POSIX character class. (Under the `-E` flag, the OS X [`find` man page](https://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man1/find.1.html) referred me to the [`re_format` man page](https://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man7/re_format.7.html#//apple_ref/doc/man/7/re_format) for supported syntax.) – Wiseguy Mar 23 '12 at 19:35

5 Answers5

10

These commands works on OSX

find -E . -iregex '.*/[0-9]{8}\.(jpg|png|eps|gif)'

this command matches 12345678.jpg , not 123456789.jpg


find -E . -iregex '.*/[0-9]{8,}\.(jpg|png|eps|gif)'

this command matches 12345678.jpg and 123456789.jpg


.*/ 

equal the folder path or the subFolder path

jackjr300
  • 7,111
  • 2
  • 15
  • 25
2

With all your answers, i was finally able to use OSX find (10.8.1) with regex. For giving back, here are my findings: We use custom strings to identify clips, the pattern goes like this: "YYMMDDabc##abc*.ext": Year/Month/Day/3chars/2digits/3chars/whatever/ext

find -E /path/to/folder -type f -regex '^/.*/[0-9]{6}[A-Za-z]{3}[0-9]{2}[A-Za-z0-9]{3}\.*.*\.(ext)$'

The initial ^ makes sure the pattern is at the beginning of the search, [0-9]{6} searches for a 6 digit string, \d does'nt work. \D doesn't work for letters, A-Za-z does. The $ in the end makes sure the last search is the end of the string.

After reading Apples manpage about find and re_format i was completely off track regarding escaping characters.

j0k
  • 22,600
  • 28
  • 79
  • 90
ugn
  • 21
  • 2
1

man re_format explains the specifics of the modern regex that find will accept.

This works for me: -iregex '[0-9]{8}'

jdi
  • 90,542
  • 19
  • 167
  • 203
  • GNU find uses BRE by default, which does not allow intervals ({8}). ERE can be used by specifying -regextype. No idea what OSX's find supports. – jordanm Mar 23 '12 at 19:10
  • It works fine using -E flag as the OP suggest. I was merely commenting on the regex pattern itself. – jdi Mar 23 '12 at 19:16
  • Hmmm I'm still not getting the expected results So I've tried this code `find -E ./ -iregex '[0-9]{8}.*'` On this list of files `102498223.jpg 103326202 (1).jpg 103326202.jpg 103724407 (1).jpg 103724407.jpg 104307929.jpg 104823717.jpg 105473655.jpg 105473655_extracted.psd 105473660.jpg 106957651.jpg 108037226.jpg 108210958.jpg 108350120.jpg 110119642.jpg 111063966.jpg 111651198.jpg 112145402.jpg 112229007.jpg 113615728.jpg` And I get 0 results returned. Somethings still not right here. – juliushibert Mar 26 '12 at 08:22
  • For me it works fine. I run it against the root of my filesystem and find various matches when I keep chaning the number count. – jdi Mar 26 '12 at 15:57
0

This has been a very eye-opening thread. I'm bringing to the table a solution to my own problem and hopefully clarifying a thing or two for you and other users looking for robustness (like I was).

In my case my mac had a bunch of duplicate photos. When macs make duplicates they append a space and a number to the end before the extension.

IMG_0001.JPG might have multiplicity complex with IMG_0001 2.JPG, IMG_0001 3.JPG and so on. In my case, this went on and on making up about 2,600 useless files.

To get things pumped up, I navigated to the folder in question.

cd ~/Pictures/

Next, let's prove to ourselves that we can list all the files in the directory. You'll notice that in the regex it's necessary to include the . that says "look in this directory". Also, you have to match the whole file name so the .+ is necessary to catch all the other characters.

find -E . -regex '\..+'

Appropriately, the results will yield the strings that you'll have to match including the . i mentioned earlier, the slash /, and everything else.

./IMG_1788.JPG
./IMG_1789.JPG
./IMG_1790.JPG
./IMG_1791.JPG

So I can't write this to find duplicates because it doesn't include the "./"

find -E . -regex 'IMG_[0-9]{4} .+'

but I can write this to find duplicates because it does include the "./"

find -E . -regex '\./IMG_[0-9]{4} .+`

or the more fancy version with .*/ as mentioned by @jackjr300 does the same thing.

find -E . -regex '.*/IMG_[0-9]{4} .+`

Lastly is the confusing part. \d isn't recognized in BSD. [0-9] works just as well. Other users' answers cited the re_format manual which lists out how to write common patterns that replace things like \d with a funny square-colon syntax that looks like this: [:digit:]. I tried and tried, but it never works. Just use [0-9]. In my case, I wasted a bunch of time thinking I should have used [:space:] instead of a space, but I found (as usual!) that I just needed to breath and really read the regex. It turned out to be my mistake. :)

Hope this helps someone!

Wray Bowling
  • 2,346
  • 2
  • 16
  • 19
0

I am using this regex to find and delete iPhone dups:

find -E . -regex '.*/IMG_[0-9]{4}[ ]1.JPG' -print -exec rm '{}' \;

Mircea Stanciu
  • 3,675
  • 3
  • 34
  • 37