I have a file, a.out, which contains a number of lines. Each line is one character only, either the unicode character U+2013
or a lower case letter a-z
.
Doing a file command on a.out elicits the result UTF-8 Unicode text.
The locale command reports:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
If I issue the command grep -P -n "[^\x00-\xFF]" a.out
I would expect only the lines containing U+2013
to be returned. And this is the case if I carry out the test under cygwin. The problem environment however is Oracle Linux Server release 6.5 and the issue is that the grep command returns no lines. If I issue grep -P -n "[\x00-\xFF]
" a.out then all lines are returned.
I realise that "[grep -P]
...is highly experimental and grep -P
may warn of unimplemented features." but no warnings are issued.
Am I missing something?