0

Is there a way I can find files with non-ascii chars? I could use a pipe of course - and filter the files with perl, but for efficiency I'd like to set it all in find. I tried the following:

find . -type f -name '*[^[:ascii:]]*'

it doesn't work at all.

Edit:

I'm now trying to make use of

find . -type f -regex '.*[^[:ascii:]].*'

It is an emacs regexp and it has [:ascii:] class. But the expression I'm trying to use doesn't work.

Edit 2:

LC_COLLATE=C find . -type f -regex '.*[^!-~].*'

matches files with non-ascii chars (a complete voodoo...). But also matches files with a space in the name.

Adobe
  • 12,967
  • 10
  • 85
  • 126
  • Please check the manual to make it interpret the string as extended regex. Also check whether extended regex support the syntax. – nhahtdh May 29 '12 at 09:24
  • Wow, I never heard of that. Now I'm trying to make work `find . -type f -regextype posix-extended -regex '[^[:ascii:]]'` – Adobe May 29 '12 at 09:32
  • It says `find: Invalid character class name`. Where do I find a list of posix-extended char classes? – Adobe May 29 '12 at 09:49
  • Check this: http://en.wikipedia.org/wiki/Regular_expression#POSIX_Extended_Regular_Expressions – nhahtdh May 29 '12 at 10:03
  • So posix doesn't have a `[:ascii:]` class. – Adobe May 29 '12 at 10:05
  • Related: [How do I grep for all non-ASCII characters](https://stackoverflow.com/q/3001177/55075) & [find and delete files with non-ascii names](https://stackoverflow.com/q/19146240/55075). – kenorb Apr 12 '18 at 22:29

1 Answers1

6

This seems to work for me in both default and posix-extended mode:

LC_COLLATE=C find . -regex '.*[^ -~].*'

There could be locale-related issues, though, and I don't have a large corpus of non-ascii filenames to test it on, but it catches the ones I have.

kenorb
  • 155,785
  • 88
  • 678
  • 743
yawfle
  • 84
  • 2
  • Found this, which shows how to work around locale settings: (http://stackoverflow.com/a/3208902/1424666) – yawfle May 30 '12 at 00:41
  • This is a strange character class. What does it mean? I never saw such things. – Adobe May 30 '12 at 07:52
  • 1
    You must have been forgotten something: `find . '.*[^!-~].*'` - matches all the file and then says `find: \`.*[^!-~].*': No such file or directory`. `find . -regex '.*[^!-~].*'` gives `find: Invalid range end`. – Adobe May 30 '12 at 08:04
  • @Adobe: Can you include your OS? Different flavor can have slightly different syntax. – nhahtdh May 30 '12 at 08:08
  • @nhahtdh: Kubuntu 12.04. – Adobe May 30 '12 at 08:09
  • Yes, sorry, I forgot to include the -regex option in the example. Did you try as described in the link I pasted?`LC_COLLATE=C find . -regex '.*[^ -~].*'` – yawfle May 30 '12 at 08:12
  • I tried this: `LC_COLLATE=C find . -type f -regex '.*[^[:ascii:]].*'`. It gives strange results... Now I'm trying Yours. – Adobe May 30 '12 at 08:15
  • See Edit2 of the original post. It mostly solves it. – Adobe May 30 '12 at 08:20