1

Can anybody explain this?

With my default locales (LANG=en_US.UTF-8) i get this:

echo 'ab' | sed 's/[A-B]/!/'  # a!
echo 'ab' | sed 's/[B-C]/!/'  # ab
echo 'ab' | sed 's/[B]/!/'    # ab

But when I set export LANG=C, everything is fine.

Tested with GNU sed version 4.2.1 on cygwin.

Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
  • Weird. As a point of reference, on Mac OS X (10.9.2), it works fine, – Brian Dec 11 '15 at 14:33
  • It works fine with same sed version and same LANG variable with Ubuntu 11.04, too. – Cyrus Dec 11 '15 at 14:34
  • Works fine on Suse as well. Possibly a bug in cygwin ? – 123 Dec 11 '15 at 14:58
  • 1
    [If the locale value is "C" or "POSIX", the POSIX locale is used and the standard utilities behave in accordance with the rules in POSIX Locale, for the associated category.](http://pubs.opengroup.org/onlinepubs/007908799/xbd/envvar.html). – Wiktor Stribiżew Dec 11 '15 at 15:16
  • Also: [Many locales sort characters in dictionary order, and in these locales `[a-dx-z]` is typically not equivalent to `[abcdxyz]`; it might be equivalent to `[aBbCcDdxXyYz]`, for example. To obtain the traditional interpretation of ranges in bracket expressions, you can force the use of the C locale by setting the LC_COLLATE or LC_ALL environment variable to the value ‘C’, or enable the `globasciiranges` shell option.](https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html) – Wiktor Stribiżew Dec 11 '15 at 15:20
  • 2
    Sorry for the quote-only comments, but I think [this one is also related](https://cygwin.com/ml/cygwin/2011-10/msg00185.html). *Your only safe way to work around it is to request LC_COLLATE=C up front.* – Wiktor Stribiżew Dec 11 '15 at 15:24
  • 1
    "POSIX 2001 and 2008 "fixed" things by saying that the use of range expressions in regular expressions is undefined in all but the C locale" - lovely :D – Karoly Horvath Dec 11 '15 at 15:45

0 Answers0